问题描述
我有一个具有以下结构的文档集合
I have a document collection with following structure
uid, name
带索引
db.Collection.createIndex({name: "text"})
它包含以下数据
1, iphone
2, iphóne
3, iphonë
4, iphónë
当我对 iphone
进行文本搜索时我只得到了两条记录,这是出乎意料的
When I am doing text search for iphone
I am getting only two records, which is unexpected
actual output
--------------
1, iphone
2, iphóne
如果我搜索 iphonë
db.Collection.find( { $text: { $search: "iphonë"} } );
I am getting
---------------------
3, iphonë
4, iphónë
但实际上我期待以下输出
But Actually I am expecting following output
db.Collection.find( { $text: { $search: "iphone"} } );
db.Collection.find( { $text: { $search: "iphónë"} } );
Expected output
------------------
1, iphone
2, iphóne
3, iphonë
4, iphónë
我在这里遗漏了什么吗?如何通过搜索 iphone
或 iphónë
获得超出预期的输出?
am I missing something here?How can I get above expected outputs, with search of iphone
or iphónë
?
推荐答案
从 mongodb 3.2 开始,文本索引 对变音符号不敏感:
Since mongodb 3.2, text indexes are diacritic insensitive:
在第 3 版中,文本索引对变音符号不敏感.那就是索引不区分包含变音符号的字符标记及其未标记的对应物,例如 é、ê 和 e.更多的具体来说,文本索引剥离归类为的字符Unicode 8.0 字符数据库道具列表中的变音符号.
所以下面的查询应该可以工作:
So the following query should work:
db.Collection.find( { $text: { $search: "iphone"} } );
db.Collection.find( { name: { $regex: "iphone"} } );
但看起来有一个带有分音符号 (¨) 的错误,即使它在 unicode 8.0 列表中被分类为变音符号(JIRA 上的问题:SERVER-29918 )
but it looks like there is a bug with dieresis ( ¨ ), even if it's caterorized as diacritic in unicode 8.0 list (issue on JIRA: SERVER-29918 )
从 mongodb 3.4 开始,您可以使用 collation允许您执行此类查询:
since mongodb 3.4 you can use collation which allows you to perform this kind of query :
例如,要获得预期的输出,请运行以下查询:
for example, to get your expected output, run the following query:
db.Collection.find({name: "iphone"}).collation({locale: "en", strength: 1})
这将输出:
{ "_id" : 1, "name" : "iphone" }
{ "_id" : 2, "name" : "iphône" }
{ "_id" : 3, "name" : "iphonë" }
{ "_id" : 4, "name" : "iphônë" }
在排序规则中,strength
是要执行的比较级别
in the collation, strength
is the level of comparaison to perform
- 1 : 仅基本字符
- 2 : 变音符号敏感
- 3 : 区分大小写 + 区分变音符号
- 1 : base character only
- 2 : diacritic sensitive
- 3 : case sensitive + diacritic sensitive
这篇关于MongoDB diacriticInSensitive 搜索未按预期显示所有重音(带有变音符号的单词)行,反之亦然的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!