问题描述
我正在开发一个需要存储通用字符的后端.
I'm working on a backend that needs to store universal characters.
我为此选择了utf8mb4
表编码.我还必须选择表排序规则.
I've chosen utf8mb4
Table Encoding for that purpose. I also have to choose Table Collation.
最直接的选择是选择utf8mb4_general_ci
表排序规则.除了一般的排序规则外,还有大约20种其他排序规则可供选择..更具体的排序规则的目的是什么? utf8mb4_general_ci
或utf8mb4_unicode520_ci
是否涵盖所有这些内容?如果要存储从中文一直到阿拉伯文字的字符,应该使用哪一个.
The most straightforward option is to choose utf8mb4_general_ci
Table collation. Besides the general one, there is also about 20 others collations to choose from.. What is the purpose of the more specific ones? Does utf8mb4_general_ci
or maybe utf8mb4_unicode520_ci
cover all of them? Which one should I use if I want to store characters ranging from chinese all the way to arab.
推荐答案
-
...general_ci
很简单.它不会将2个字符的组合(例如带有非空格标记的字符)等同于单个字符的组合....general_ci
is simple. It does not equate 2-character combinations (such as with a non-spacing mark) with the single-character equivalent....unicode_520_ci
来自Unicode版本5.20,这是MySQL使用MySQL时可用的最新版本.它可以处理诸如订购Emoji表情之类的事情,而以前的版本则没有....unicode_520_ci
comes from Unicode version 5.20, the latest version available when MySQL picked up on it. It handles things like having an ordering for Emoji, which previous versions did not have.对于MySQL 8.0,基于Unicode 9.0的首选排序规则是
utf8mb4_0900_ai_ci
.With MySQL 8.0, the preferred collation is
utf8mb4_0900_ai_ci
, based on Unicode 9.0....<language>_ci
处理以给定语言找到的变体.例如,应该将西班牙语中的ch
和ll
视为字母",并在cz
和d
以及lz
和m
之间进行排序....<language>_ci
handles variations found in the given language. For example, shouldch
andll
in Spanish be treated as "letters" and sort betweencz
andd
, andlz
andm
.对于一般用途,请不要使用
...general_ci
,请使用从Unicode派生的最新版本.对于特定于语言的情况,请选择其他归类之一.For general use, do not use
...general_ci
, use the latest version derived from Unicode. For language-specific situations, pick one of the other collations.我确实知道中文(阿拉伯语)在不同归类中的排序方式是否不同(甚至不同).但是,我看到
...persion_ci
,所以我怀疑是有问题.I do know know how (or even whether) Chinese and Arabic are sorted differently in the different collations. However, I see
...persion_ci
, so I suspect there is an issue.请使用
utf8mb4
,而不要使用utf8
,尤其是因为您需要中文.Do use
utf8mb4
, notutf8
, especially since you need Chinese.这篇关于为通用字符选择表排序规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!