问题描述
我正在尝试在Cypher中使用Neo4j 2.1.5 regex匹配并遇到问题.
I'm trying to use the Neo4j 2.1.5 regex matching in Cypher and running into problems.
我需要对用户有权访问的特定字段执行全文搜索.访问需求是关键,这使我无法将所有内容都转储到Lucene实例中并以这种方式进行查询.访问系统是动态的,因此我需要查询特定用户有权访问的节点集,然后在这些节点内执行搜索.我真的很想针对Lucene查询来匹配节点集,但是我不知道该怎么做,所以我现在仅使用基本的正则表达式匹配.我的问题是Neo4j并不总是返回预期的结果.
I need to implement a full text search on specific fields that a user has access to. The access requirement is key and is what prevents me from just dumping everything into a Lucene instance and querying that way. The access system is dynamic and so I need to query for the set of nodes that a particular user has access to and then within those nodes perform the search. I would really like to match the set of nodes against a Lucene query, but I can't figure out how to do that so I'm just using basic regex matching for now. My problem is that Neo4j doesn't always return the expected results.
例如,我大约有200个节点,其中一个是以下节点:
For example, I have about 200 nodes with one of them being the following:
( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
此查询产生一个结果:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.name =~ "(?i).*mosaic.*")
RETURN i
> Returned 1 row in 569 ms
但是,即使description属性与表达式匹配,此查询也会产生零结果:
But this query produces zero results even though the description property matches the expression:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 601 ms
即使该查询包含先前返回结果的name属性,该查询也会产生零结果:
And this query also produces zero results even though it includes the name property which returned results previously:
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 487 ms
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
RETURN searchText
>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...
更奇怪的是,如果我搜索其他术语,它会毫无问题地返回所有预期结果.
Even more odd, if I search for a different term, it returns all of the expected results without a problem.
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*plumbing.*")
RETURN i
> Returned 8 rows in 522 ms
然后我尝试将搜索文本缓存在节点上,并添加了索引以查看是否会更改任何内容,但仍然没有产生任何结果.
I then tried to cache the search text on the nodes and I added an index to see if that would change anything, but it still didn't produce any results.
CREATE INDEX ON :node(searchText)
MATCH (p)-->(:group)-->(i:node)
WHERE (i.searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 3182 ms
然后我尝试简化数据以重现该问题,但是在这种简单情况下,它可以按预期工作:
I then tried to simplify the data to reproduce the problem, but in this simple case it works as expected:
MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
WITH i, (
i.name + " " + COALESCE(i.description, "")
) AS searchText
WHERE searchText =~ "(?i).*mosaic.*"
RETURN i
> Returned 1 rows in 630 ms
我也尝试使用CYPHER 2.1.EXPERIMENTAL标记,但这并没有改变任何结果.我对正则表达式支持的工作方式是否做出错误的假设?还有其他我应该尝试的方法或其他方法来调试问题吗?
I tried using the CYPHER 2.1.EXPERIMENTAL tag as well but that didn't change any of the results. Am I making incorrect assumptions on how the regex support works? Is there something else I should try or some other way to debug the problem?
其他信息
这里是创建节点时我对Cypher Transactional Rest API进行的示例调用.这是在将节点添加到数据库时发送的实际纯文本(除了某些格式,以便于阅读).任何字符串编码都只是Go在创建新的HTTP请求时执行的标准URL编码.
Here is a sample call that I make to the Cypher Transactional Rest API when creating my nodes. This is the actual plain text that is sent (other than some formatting for easier reading) when adding nodes to the database. Any string encoding is just standard URL encoding that is performed by Go when creating a new HTTP request.
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MATCH (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 })"
}
]}
如果是编码问题,为什么在name
上搜索无效,在description
上无效,而在name
+ description
上无效?有什么方法可以检查数据库以查看是否/如何编码数据.当我执行搜索时,返回的文本显示正确.
If it is an encoding issue, why does a search on name
work, description
not work, and name
+ description
not work? Is there any way to examine the database to see if/how the data was encoded. When I perform searches, the text returned appears correct.
推荐答案
请注意以下几点:
- 可能用merge替换create unique(这有点不同)
- 对于您的全文搜索,我会使用 lucene旧版索引,以提高性能,如果您的组限制没有足够的限制将响应保持在几毫秒以下
- probably replace create unique with merge (which works a bit differently)
- for your fulltext search I would go with the lucene legacy index for performance, if your group restriction is not limiting enough to keep the response below a few ms
我刚刚尝试了您的确切json语句,它完美运行.
I just tried your exact json statement, and it works perfectly.
插入
curl -H accept:application/json -H content-type:application/json -d @insert.json \
-XPOST http://localhost:7474/db/data/transaction/commit
json:
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MERGE (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 }) RETURN m"
}
]}
查询:
MATCH (p)-->(:group)-->(i:material)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
返回:
name: Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description: Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material
您可以尝试检查数据的方法是查看浏览器提供的json或csv转储(结果和表结果上的小下载图标)
What you can try to check your data is to look at the json or csv dumps that the browser offers (little download icons on the result and table-result)
或者您将neo4j-shell与我的 shell-import-tools 实际输出csv或graphml并检查这些文件.
Or you use neo4j-shell with my shell-import-tools to actually output csv or graphml and check those files.
或者使用一些Java(或常规)代码检查数据.
Or use a bit of java (or groovy) code to check your data.
neo4j-enterprise下载文件还附带有一致性检查程序.这是博客文章有关如何运行它的信息.
There is also the consistency-checker that comes with the neo4j-enterprise download. Here is a blog post on how to run it.
java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo
我在此处添加了一个时髦的测试脚本: https://gist.github.com/jexp/5a183c3501869ee63d30
I added a groovy test script here: https://gist.github.com/jexp/5a183c3501869ee63d30
有时候会发生多行事件,还有另外两个标志:
Sometimes there is a multiline thing going on, there are two more flags:
-
multiline (?m)
也可以跨多行和 匹配 -
dotall (?s)
允许点也匹配换行符之类的特殊字符
multiline (?m)
which also matches across multiple lines anddotall (?s)
which allows the dot also to match special chars like newlines
所以您可以尝试(?ism).*mosaic.*
这篇关于Neo4j正则表达式字符串匹配未返回预期结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!