本文介绍了在MySQL中使用重音敏感主键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MySQL中具有重音敏感的主键.

Have an accent sensitive primary key in MySQL.

我有一个独特的单词表,所以我将单词本身用作主键(顺便说一句,如果有人可以给我一些建议,我不知道它是否是一个好的设计/实践).

I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).

我需要使该字段具有重音(为什么不区分大小写),因为它必须区分例如法语动词"demander"的两个不同的变体形式的'demandé''demande'.我在数据库中存储重音词没有任何问题.我只是不能插入两个没有重音的重音字符串.

I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.

在尝试使用以下查询创建'demandé'行时:

When trying to create the 'demandé' row with the following query:

INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');

我收到此错误:

ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'

问题:

  • 在该过程中应该进行哪些修改,以便在该表中具有用于"demande"和demandé"的两个不同的唯一主键?
  • 解决方案 在表声明中使用"collat​​e utf8_general_ci"

    • 如何进行对口音敏感的查询?以下是正确的方法:SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
    • How can i make accent sensitive queries ? Is the following the right way :SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin

    解决方案 在WHERE语句中使用'collat​​e utf8_bin'

    • 我发现我可以通过使用BINARY关键字实现这一点(请参阅此 sqlFiddle ). collatebinary有什么区别?

    • I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?

    是否可以保留其他表的任何更改? (无论如何,我都必须重建该表,因为它有点凌乱)

    Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)

    我对使用MySQL编码不太满意.我对该数据库中的编码还没有任何问题(我很幸运,因为我的数据可能并不总是使用相同的编码...而且我对此无能为力).我觉得对口音敏感"问题的任何修改都可能导致其他查询或数据完整性方面的编码问题.我应该担心吗?

    I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?

    数据库创建:

    CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
    

    不重复字词表:

    CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
    

    查询

    SELECT * FROM corpus.token WHERE name_token = 'demande';
    SELECT * FROM corpus.token WHERE name_token = 'demandé';
    

    都返回同一行:

    demande
    

    推荐答案

    整理.您有两个选择,而不是三个:

    Collations. You have two choices, not three:

    utf8_bin将所有这些都视作 :demandédemandeDemandé.

    utf8_bin treats all of these as different: demandé and demande and Demandé.

    utf8_..._ci(通常为utf8_general_ciutf8_unicode_ci)将所有这些都视为相同:demandédemandeDemandé.

    utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.

    如果只需要区分大小写(demandé = demande,但都不匹配Demandé),那么您就不走运了.

    If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.

    如果您只想要重音灵敏度(demandé = Demandé,但都不匹配demande),那么您就不走运了.

    If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.

    声明.做任何事情的最佳方法:

    Declaration. The best way to do whatever you pick:

    CREATE TABLE (
        name VARCHAR(...)  CHARACTER SET utf8  COLLATE utf8_...  NOT NULL,
        ...
        PRIMARY KEY(name)
    )
    

    不要即时更改排序规则.如果name中的排序规则不同,则不会使用索引(即会很慢):

    Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:

    WHERE name = ... COLLATE ...
    

    BINARY . 数据类型 BINARYVARBINARYBLOB非常类似于CHARVARCHARTEXTCOLLATE ..._bin.也许唯一的区别是将检查文本是否存储在VARCHAR ... COLLATE ..._bin中有效的utf8,但在存储到VARBINARY...中时将不对其进行检查. 比较(WHEREORDER BY等)将相同;也就是说,只需比较这些位,不要进行大小写折叠或重音剥离等.

    BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.

    这篇关于在MySQL中使用重音敏感主键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 02:43