问题描述
我很难用阿拉伯文字进行变音符号不敏感搜索.
I have trouble making a diacritic insensitive search with arabic text.
我已经为该表测试了多种设置:utf8和utf16中的编码以及utf8_general_ci,utf16_general_ci和utf16_unicode_ci中的排序规则.
I have tested multiple setups for the table in question: encodings in utf8 and utf16 as well as collations in utf8_general_ci, utf16_general_ci and utf16_unicode_ci.
该搜索适用于åä特殊字符.即:
The search works for åä special characters. I.e:
select * from test where text like '%a%'
将返回文本为a,å或ä的列.但这不适用于阿拉伯语变音符号.也就是说,如果文字是بِسْمِ,而我搜寻بسم,则不会有任何点击.
Would return columns where text is a, å or ä. But it won't work with the Arabic diacritics. I.e if the text is بِسْمِ and I search for بسم, I don't get any hits.
有什么想法可以通过这个吗?
Any ideas how to get pass this?
真正的用法稍后将是PHP(一种搜索功能),但是在将其移植到PHP之前,我直接在MySQL数据库中进行测试.
The real usage will later be PHP (a search function), but I'm working directly in the MySQL db just for testing before I port it over to PHP.
(来自评论)
CREATE TABLE test (
↵ id int(11) unsigned NOT NULL AUTO_INCREMENT,
↵ text text COLLATE utf8_unicode_ci,
↵ PRIMARY KEY (id)↵
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
推荐答案
(这不是答案",而是解决方案".)
(This is not an "answer", but a "resolution".)
LIKE
似乎不适用于您的阿拉伯字符串.我不知道还有多少失败.我建议您在 http://bugs.mysql.com 上编写错误报告.这是一个测试案例,表明LIKE '...'
和LIKE '%...%'
均未找到两个字符串,而'='有效:
It seems that LIKE
does not work with your Arabic string. I don't know how much more it fails on. I recommend you write a bug report at http://bugs.mysql.com . Here is a test case that shows that neither LIKE '...'
nor LIKE '%...%'
finds both strings, whereas '=' works:
CREATE TABLE so28863402 (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
txt text COLLATE utf8_unicode_ci, -- deliberate choice of COLLATION
PRIMARY KEY (id)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8;
INSERT INTO so28863402 (txt) VALUES
(UNHEX('D8A8D990D8B3D992D985D990')), -- Using hex to avoid any copy/paste issues
(UNHEX('D8A8D8B3D985')); -- The values should compare equal
SELECT id, txt, HEX(txt) FROM so28863402;
SELECT txt, COUNT(*) FROM so28863402 GROUP BY txt; -- GROUP BY finds them equal.
SELECT * from so28863402
WHERE txt = 'بسم'; -- Finds both rows (correct)
SELECT * from so28863402
WHERE txt LIKE '%بسم%'; -- Finds one row (incorrect)
-- Further checks:
SELECT * FROM so28863402 WHERE txt = UNHEX( 'D8A8D8B3D985' );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX( 'D8A8D8B3D985' );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX('25D8A8D8B3D98525'); -- x25 is '%'
这篇关于MySQL变音符号不敏感搜索(阿拉伯语)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!