导入Neo4j最快的方法是什么?

本文介绍了导入Neo4j最快的方法是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个JSON文档列表，格式为:

I have a list of JSON documents, in the format:

[{a:1, b:[2,5,6]}, {a:2, b:[1,3,5]}, ...]

我需要做的是使用参数a制作节点，并将它们连接到列表b中所有具有a值的节点.因此，第一个节点将连接到节点2、5和6.现在，我正在使用Python的neo4jrestclient进行填充，但这需要很长时间.有没有更快的填充方法?

What I need to do is make nodes with parameter a, and connect them to all the nodes in the list b that have that value for a. So the first node will connect to nodes 2, 5 and 6. Right now I'm using Python's neo4jrestclient to populate but it's taking a long time. Is there a faster way to populate?

当前，这是我的脚本:

break_list = []
for each in ans[1:]:
    ref = each[0]
    q = """MATCH n WHERE n.url = '%s' RETURN n;""" %(ref)
    n1 = gdb.query(q, returns=client.Node)[0][0]
    for link in each[6]:
        if len(link)>4:
            text,link = link.split('!__!')
            q2 = """MATCH n WHERE n.url = '%s' RETURN n;""" %(link)
            try:
                n2 = gdb.query(q2, returns=client.Node)
                n1.relationships.create("Links", n2[0][0], anchor_text=text)
            except:
                break_list.append((ref,link))

推荐答案

您可能要考虑将JSON转换为CSV(使用类似 jq )，那么您可以使用用于导入的Cypher工具. LOAD CSV针对数据导入进行了优化，因此使用此方法将具有更好的性能.以您的示例为例，LOAD CSV脚本将如下所示:

You might want to consider converting your JSON to CSV (using some like jq), then you could use the LOAD CSV Cypher tool for import. LOAD CSV is optimized for data import so you will have much better performance using this method. With your example the LOAD CSV script would look something like this:

您的JSON已转换为CSV:

Your JSON converted to CSV:

"a","b"
"1","2,5,6"
"2","1,3,5"

首先创建唯一性约束/索引.这样可以确保为任何名称"仅创建一个Node，并创建索引以提高查找性能.

First create uniqueness constraint / index. This will ensure only one Node is created for any "name" and create an index for faster lookup performance.

CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;

鉴于上述CSV文件，此Cypher脚本可用于有效导入数据:

Given the above CSV file this Cypher script can be used to efficiently import data:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///path/to/file.csv" AS row
MERGE (a:Person{name: row.a})
WITH a,row
UNWIND split(row.b,',') AS other
MERGE (b:Person {name:other})
CREATE UNIQUE (a)-[:CONNECTED_TO]->(b);

其他选项

另一种选择是使用JSON作为Cypher查询中的参数，然后使用UNWIND遍历JSON数组的每个元素.

Another option is to use the JSON as a parameter in a Cypher query and then iterate through each element of the JSON array using UNWIND.

WITH {d} AS json
UNWIND json AS doc
MERGE (a:Person{name: doc.a})
WITH doc, a
UNWIND doc.b AS other
MERGE (b:Person{name:other})
CREATE UNIQUE (a)-[:CONNECTED_TO]->(b);

尽管非常大的JSON数组可能会出现一些性能问题.在此处和此处.

Although there might be some performance issues with a very large JSON array. See some examples of this here and here.

这篇关于导入Neo4j最快的方法是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！