我从一个csv文件导入了数据并创建了许多节点,所有节点都与同一数据集中基于“树编号”层次结构系统的其他节点相关:
例如,树编号为a01.111的节点是节点a01的直接子节点,树编号为a01.111.230的节点是节点a01.111的直接子节点。
我要做的是在节点之间创建唯一的关系,这些节点是其他节点的直接子节点。例如,节点a01.111.230应该只有一个与节点a01.111的“is_child_of”关系。
我尝试过几件事,例如:

MATCH (n:Node), (n2:Node)
WHERE (n2.treeNumber STARTS WITH n.treeNumber)
AND (n <> n2)
AND NOT ((n2)-[:IS_CHILD_OF]->())
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n);

此示例导致创建唯一的“is-child-of”关系,但不创建与节点的直接父节点的关系。相反,节点a01.111.230将与节点a01相关。

最佳答案

我想建议另一个通用的解决方案,也就是避免像@inversefalcon指出的那样使用笛卡尔积。
我们首先创建一个索引,以便更快地查找,然后插入一些测试数据:

CREATE CONSTRAINT ON (n:Node) ASSERT n.treeNumber IS UNIQUE;
CREATE (n:Node {treeNumber: 'A01.111.230'})
CREATE (n:Node {treeNumber: 'A01.111'})
CREATE (n:Node {treeNumber: 'A01'})

然后我们需要扫描所有节点作为潜在的父节点,并查找以父节点的treeNumber开始(STARTS WITH可以使用索引)并且在treeNumber的“余数”中没有点(即直接子节点)的子节点,而不是拆分、连接等:
MATCH (p:Node), (c:Node)
WHERE c.treeNumber STARTS WITH p.treeNumber
  AND p <> c
  AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c

为了分析的目的,我用一个简单的RETURN替换了关系的创建,但是您可以用CREATE UNIQUEMERGE替换它。
实际上,我们可以通过预先计算应该匹配的实际前缀来去掉p <> c谓词和长度上的+ 1
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
WHERE c.treeNumber STARTS WITH parentNumber
  AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c

但是,分析该查询表明没有使用索引,并且存在笛卡尔积(因此我们有一个o(n^2)算法):
Compiler CYPHER 3.0

Planner COST

Runtime INTERPRETED

+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Operator           | Estimated Rows | Rows | DB Hits | Variables            | Other                                                                                                                              |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults    |              2 |    2 |       0 | c, p                 | p, c                                                                                                                               |
| |                  +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Filter            |              2 |    2 |      26 | c, p, parentNumber   | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{  AUTOSTRING1})) AND StartsWith(c.treeNumber,parentNumber) |
| |                  +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Apply             |              2 |    9 |       0 | p, parentNumber -- c |                                                                                                                                    |
| |\                 +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan |              9 |    9 |      12 | c                    | :Node                                                                                                                              |
| |                  +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Projection        |              3 |    3 |       3 | parentNumber -- p    | p; Add(p.treeNumber,{  AUTOSTRING0})                                                                                               |
| |                  +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan   |              3 |    3 |       4 | p                    | :Node                                                                                                                              |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+

Total database accesses: 45

但是,如果我们简单地添加这样的提示
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH parentNumber
  AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c

它确实使用索引,我们有类似于o(n*log(n))算法(log(n)用于索引查找):
Compiler CYPHER 3.0

Planner COST

Runtime INTERPRETED

+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| Operator                      | Estimated Rows | Rows | DB Hits | Variables            | Other                                                                                    |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +ProduceResults               |              2 |    2 |       0 | c, p                 | p, c                                                                                     |
| |                             +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Filter                       |              2 |    2 |       6 | c, p, parentNumber   | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{  AUTOSTRING1})) |
| |                             +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Apply                        |              2 |    3 |       0 | p, parentNumber -- c |                                                                                          |
| |\                            +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| | +NodeUniqueIndexSeekByRange |              9 |    3 |       6 | c                    | :Node(treeNumber STARTS WITH parentNumber)                                               |
| |                             +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Projection                   |              3 |    3 |       3 | parentNumber -- p    | p; Add(p.treeNumber,{  AUTOSTRING0})                                                     |
| |                             +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +NodeByLabelScan              |              3 |    3 |       4 | p                    | :Node                                                                                    |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+

Total database accesses: 19

注意,在前面介绍创建前缀的WITH步骤时,我确实有点作弊,因为我注意到它改进了执行计划和数据库访问
MATCH (p:Node), (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH p.treeNumber
  AND p <> c
  AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c

其执行计划如下:
Compiler CYPHER 3.0

Planner RULE

Runtime INTERPRETED

+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| Operator     | Rows | DB Hits | Variables | Other                                                                                                                      |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +Filter      |    2 |       9 | c, p      | NOT(p == c) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{  AUTOINT0}),None),{  AUTOSTRING1})) |
| |            +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +SchemaIndex |    6 |      12 | c -- p    | PrefixSeekRangeExpression(p.treeNumber); :Node(treeNumber)                                                                 |
| |            +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabel |    3 |       4 | p         | :Node                                                                                                                      |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+

Total database accesses: 25

最后,对于记录,我编写的原始查询(即没有提示)的执行计划是:
Compiler CYPHER 3.0

Planner COST

Runtime INTERPRETED

+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator           | Estimated Rows | Rows | DB Hits | Variables | Other                                                                                                                                                                |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults    |              2 |    2 |       0 | c, p      | p, c                                                                                                                                                                 |
| |                  +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter            |              2 |    2 |      21 | c, p      | NOT(p == c) AND StartsWith(c.treeNumber,p.treeNumber) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{  AUTOINT0}),None),{  AUTOSTRING1})) |
| |                  +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +CartesianProduct  |              9 |    9 |       0 | p -- c    |                                                                                                                                                                      |
| |\                 +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan |              3 |    9 |      12 | c         | :Node                                                                                                                                                                |
| |                  +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan   |              3 |    3 |       4 | p         | :Node                                                                                                                                                                |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Total database accesses: 37

不是更糟的:没有提示但有预先计算的前缀的是!这就是为什么你应该经常测量。

08-25 04:05