问题描述
我正在尝试通过REGEX
从MySQL
数据库获取数据,无论是否带有特殊的utf-8字符.
I am trying to get data from MySQL
database via REGEX
with or without special utf-8 characters.
让我解释一个例子:
如果用户输入类似sirena
的单词,则应返回包含诸如sirena
,siréna
,šíreňá
等单词的行.并且当他输入siréná
时它应该向后工作,它应该返回相同的结果.
If user enters word like sirena
it should return rows which include words like sirena
,siréna
,šíreňá
.. and so on..also it should work backwards when he enters siréná
it should return the same results..
我正在尝试通过REGEX
搜索它,我的查询看起来像这样:
I am trying to search it via REGEX
, my query looks like this :
SELECT * FROM `content` WHERE `text` REGEXP '[sšŠ][iíÍ][rŕŔřŘ][eéÉěĚ][nňŇ][AaáÁäÄ0]'
仅当数据库中的单词为sirena
时有效,而当单词为siréňa
时无效..
It works only when in database is word sirena
but not when there is word siréňa
..
是因为UTF-8
和MySQL有问题吗? (mysql列的排序规则是utf8_general_ci
)
Is it because something with UTF-8
and MySQL? (collation of mysql column is utf8_general_ci
)
谢谢!
推荐答案
MySQL的正则表达式库不支持utf-8.
MySQL's regular expression library does not support utf-8.
请参见错误#30241正则表达式问题,该问题自2007年开始开放他们必须先更改所使用的正则表达式库,然后才能对其进行修复,但我还没有发现有关何时或是否会这样做的任何公告.
See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.
我看到的唯一解决方法是搜索特定的十六进制字符串:
The only workaround I've seen is to search for specific HEX strings:
mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text |
+----------+
| siréňa |
+----------+
发表您的评论
Re your comment:
不,我不知道MySQL的任何解决方案.
No, I don't know of any solution with MySQL.
您可能必须切换到PostgreSQL,因为RDBMS在正则表达式语法.
You might have to switch to PostgreSQL, because that RDBMS supports \u
codes for UTF characters in their regular expression syntax.
这篇关于mysql regex utf-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!