问题描述
我对ElasticSearch来说是相当新鲜的,并且有一个关于停止词的问题。我有一个索引,其中包含美国的州名。例如:纽约州/纽约州,加州/加州,俄勒冈州/或。我相信俄勒冈州的缩写OR是一个停止词,所以当我将状态数据插入到索引中时,我不能在OR上搜索。有没有办法为此设定定制的禁用词,或者我做错了?
I am fairly new to ElasticSearch and have a question on stop words. I have an index that contains state names for the USA....ex: New York/NY, California/CA,Oregon/OR. I believe Oregon's abbreviation, 'OR' is a stop word, so when I insert the state data into the index, I cannot search on 'OR'. Is there a way I can set up custom stopwords for this or am I doing something wrong?
这是我如何构建索引:
curl -XPUT http:// localhost:9200 / test / state / 1 -d'{stateName:[California,CA]}'
curl -XPUT http :// localhost:9200 / test / state / 2 -d'{stateName:[New York,NY]}'
curl -XPUT http:// localhost:9200 / test / state / 3 -d'{stateName:[Oregon,OR]}'
Here is how I am building the index: curl -XPUT http://localhost:9200/test/state/1 -d '{"stateName": ["California","CA"]}' curl -XPUT http://localhost:9200/test/state/2 -d '{"stateName": ["New York","NY"]}' curl -XPUT http://localhost:9200/test/state/3 -d '{"stateName": ["Oregon","OR"]}'
纽约,工作正常。例如:
A search for 'NY', works fine. Ex:
curl -XGET'http:// localhost:9200 / test / state / _search?pretty = 1' d'
{
查询:{
match:{
stateName:NY
}
}
''
但搜索OR返回零点击:
But a search for 'OR', returns zero hits:
curl -XGET'http:// localhost:9200 / test / state / _search?pretty = 1'-d'
{
查询:{
match:{
stateName:OR
}
}
}'
我相信这个搜索不会返回结果,因为OR是停止的字,但我不知道如何解决这个问题。感谢您的帮助。
I believe this search returns no results because OR is stop word, but I don't know how to work around this. Thanks for you help.
推荐答案
您可以(绝对应该)通过修改根据您的数据和您要搜索的方式。
You can (and definitely should) control the way you index data by modifying your mapping according to your data and the way you want to search against it.
在你的情况下,我将禁用该特定字段的禁用词,而不是修改停用词列表,但是如果你愿意,也可以做后者。关键是,您使用的是非常好的默认映射,但您可以看到需要根据需要进行调整。
In your case I would disable stopwords for that specific field rather than modifying the stopword list, but you could do the latter too if you wish to. The point is that you're using the default mapping which is great to start with, but as you can see you need to tweak it depending on your needs.
对于每个字段,可以指定要使用的分析器。分析器定义了将文本分成要进行索引的令牌(tokenizer)的方式,还可以使用令牌过滤器对每个令牌(甚至删除或添加新的标记)进行其他更改。
For each field, you can specify what analyzer to use. An analyzer defines the way you split your text into tokens (tokenizer) that will be indexed and also additional changes you can make to each token (even remove or add new ones) using token filters.
您可以在或之后使用(只要您所做的更改向后兼容)。
You can specify your mapping either while creating your index or update it afterwards using the put mapping api (as long as the changes you make are backwards compatible).
这篇关于有没有办法“逃跑” ElasticSearch停止词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!