NEST查询以精确匹配文本

NEST查询以精确匹配文本

本文介绍了NEST查询以精确匹配文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个NEST查询,该查询应基于完全匹​​配的字符串返回结果.我已经在网络上进行了研究,并且对使用术语,匹配,匹配短语有一些建议.我已经尝试了所有这些方法,但是我的搜索返回的结果包含搜索字符串的一部分.例如,在我的数据库中,我有以下几行电子邮件地址:

I am trying to write a NEST query that should return results based on exact string match. I have researched on web and there are suggestions about using Term, Match, MatchPhrase. I have tried all those but my searches are returning results that contains part of search string.For example, In my database i have following rows of email addresses:

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

无论我是否使用:

client.Search<Emails>(s => s.From(0)
                        .Size(MaximumSearchResultsSize)
                        .Query(q => q.Term( p=> p.OnField(fielname).Value(fieldValue))))

  client.Search<Emails>(s => s.From(0).
                              Size(MaximumPaymentSearchResults).
                              Query(q=>q.Match(p=>p.OnField(fieldName).Query(fieldValue))));

我的搜索结果总是返回包含部分搜索"字符串的行.

My search results are always returning rows containing "partial search" string.

因此,如果我将搜索字符串提供为"ter",则我仍将获得所有3行[email protected]

So, if i provide the search string as "ter", I am still getting all the 3 [email protected]

[email protected]

[email protected]

[email protected]

[email protected]

如果搜索字符串为"ter",我希望不会返回任何行.如果搜索字符串为"[email protected]",那么我只希望看到"[email protected]".

I expect to see no rows returned if the search string is "ter".If the search string is "[email protected]" then i would like to see only "[email protected]".

不知道我在做什么错.

推荐答案

根据您在问题中提供的信息,听起来好像包含电子邮件地址的字段已用 标准分析器 ,已应用默认分析器如果未指定其他分析器或该字段未标记为not_analyzed,则将字符串字段设置为字符串.

Based on the information you have provided in the question, it sounds like the field that contains the email address has been indexed with the Standard Analyzer, the default analyzer applied to string fields if no other analyzer has been specified or the field is not marked as not_analyzed.

使用 分析Elasticsearch的API :

curl -XPOST "http://localhost:9200/_analyze?analyzer=standard&text=ter%40gmail.com

文本输入需要进行url编码,如此处用@符号所示.运行该查询的结果是

The text input needs to be url encoded, as demonstrated here with the @ symbol. The results of running this query are

{
   "tokens": [
      {
         "token": "ter",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "gmail.com",
         "start_offset": 4,
         "end_offset": 13,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

我们可以看到标准分析器为输入tergmail.com生成了两个令牌,这将存储在字段的倒排索引中.

We can see that the standard analyzer produces two tokens for the input, ter and gmail.com, and this is what will be stored in the inverted index for the field.

现在,运行 Match 查询将导致对匹配查询的输入进行分析,默认情况下,使用与在其中应用匹配查询的字段的映射定义中找到的分析器相同的分析器.

Now, running a Match query will cause the input to the match query to be analyzed, by default using the same analyzer as the one found in the mapping definition for the field on which the match query is being applied.

默认情况下,此匹配查询分析产生的令牌随后会默认组合为布尔值或查询,这样,包含该字段的倒排索引中的任何一个令牌的任何文档都将是一个匹配项.对于示例

The resulting tokens from this match query analysis are then combined by default into a boolean or query such that any document that contains any one of the tokens in inverted index for the field will be a match. For the example

文本[email protected],这意味着任何与该字段的tergmail.com匹配的文档都将成为命中

text [email protected], this would mean any documents that have a match for ter or gmail.com for the field would be a hit

// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: [email protected] -> match query -> docs with ter or gmail.com are a hit!

很明显,对于完全匹配,这根本不是我们想要的!

Clearly, for an exact match, this is not what we intend at all!

运行 条款查询将导致分析该词条查询的输入 ,即它是一个与词条输入完全匹配的查询,但正在运行在索引时间已经分析过的字段上,这可能是一个问题;由于已对该字段的值进行了分析,但尚未对词条查询进行输入,因此,由于在索引时间进行了分析,因此您将获得返回的结果与词条输入完全匹配的结果.例如

Running a Term query will cause the input to the term query to not be analyzed i.e. it's a query for an exact match to the term input, but running this on a field that has been analyzed at index time could potentially be a problem; since the value for the field has undergone analysis but the input to the term query has not, you are going to get results returned that exactly match the term input as a result of the analysis that happened at index time. For example

// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: [email protected] -> term query -> No exact matches for [email protected]

input: ter -> term query -> docs with ter in inverted index are a hit!

这也不是我们想要的!

This is not what we want either!

我们可能要对此字段执行的操作是在映射定义中将其设置为not_analyzed

What we probably want to do with this field is set it to be not_analyzed in the mapping definition

putMappingDescriptor
    .MapFromAttributes()
    .Properties(p => p
        .String(s => s.Name(n => n.FieldName).Index(FieldIndexOption.NotAnalyzed)
    );

有了这个选项,我们就可以使用 术语过滤器,使用 已过滤查询

With this in place, we can search for exact matches with a Term filter using a Filtered query

// change dynamic to your type
var docs = client.Search<dynamic>(b => b
    .Query(q => q
        .Filtered(fq => fq
            .Filter(f => f
                .Term("fieldName", "[email protected]")
            )
        )
    )
);

这将产生以下查询DSL

which will produce the following query DSL

{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "fieldName": "[email protected]"
        }
      }
    }
  }
}

这篇关于NEST查询以精确匹配文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 05:12