本文介绍了MongoDB diacriticInSensitive 搜索未按预期显示所有重音(带有变音符号的单词)行,反之亦然的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的文档集合

I have a document collection with following structure

uid, name

带索引

db.Collection.createIndex({name: "text"})

它包含以下数据

1, iphone
2, iphóne
3, iphonë
4, iphónë

当我对 iphone 进行文本搜索时我只得到了两条记录,这是出乎意料的

When I am doing text search for iphoneI am getting only two records, which is unexpected

actual output
--------------
1, iphone
2, iphóne

如果我搜索 iphonë

db.Collection.find( { $text: { $search: "iphonë"} } );

I am getting
---------------------
3, iphonë
4, iphónë

但实际上我期待以下输出

But Actually I am expecting following output

db.Collection.find( { $text: { $search: "iphone"} } );
db.Collection.find( { $text: { $search: "iphónë"} } );

    Expected output
    ------------------
    1, iphone
    2, iphóne
    3, iphonë
    4, iphónë

我在这里遗漏了什么吗?如何通过搜索 iphoneiphónë 获得超出预期的输出?

am I missing something here?How can I get above expected outputs, with search of iphone or iphónë?

推荐答案

mongodb 3.2 开始,文本索引 对变音符号不敏感:

Since mongodb 3.2, text indexes are diacritic insensitive:

在第 3 版中,文本索引对变音符号不敏感.那就是索引不区分包含变音符号的字符标记及其未标记的对应物,例如 é、ê 和 e.更多的具体来说,文本索引剥离归类为的字符Unicode 8.0 字符数据库道具列表中的变音符号.

所以下面的查询应该可以工作:

So the following query should work:

db.Collection.find( { $text: { $search: "iphone"} } );
db.Collection.find( { name: { $regex: "iphone"} } );

但看起来有一个带有分音符号 (¨) 的错误,即使它在 unicode 8.0 列表中被分类为变音符号(JIRA 上的问题:SERVER-29918 )

but it looks like there is a bug with dieresis ( ¨ ), even if it's caterorized as diacritic in unicode 8.0 list (issue on JIRA: SERVER-29918 )

mongodb 3.4 开始,您可以使用 collat​​ion允许您执行此类查询:

since mongodb 3.4 you can use collation which allows you to perform this kind of query :

例如,要获得预期的输出,请运行以下查询:

for example, to get your expected output, run the following query:

db.Collection.find({name: "iphone"}).collation({locale: "en", strength: 1})

这将输出:

{ "_id" : 1, "name" : "iphone" }
{ "_id" : 2, "name" : "iphône" }
{ "_id" : 3, "name" : "iphonë" }
{ "_id" : 4, "name" : "iphônë" }

在排序规则中,strength 是要执行的比较级别

in the collation, strength is the level of comparaison to perform

  • 1 : 仅基本字符
  • 2 : 变音符号敏感
  • 3 : 区分大小写 + 区分变音符号
  • 1 : base character only
  • 2 : diacritic sensitive
  • 3 : case sensitive + diacritic sensitive

这篇关于MongoDB diacriticInSensitive 搜索未按预期显示所有重音(带有变音符号的单词)行,反之亦然的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-09 16:40