问题描述
我正在重写我们的数据库类(基于 PDO),并被困在这个问题上.在 PHP 和 MySQL 中使用 UTF-8 时,我被教导要同时使用 SET NAMES utf8
和 SET CHARACTER SET utf8
.
I´m rewritting our database class (PDO based), and got stuck at this. I´ve been taught to both use SET NAMES utf8
and SET CHARACTER SET utf8
when working with UTF-8 in PHP and MySQL.
在 PDO 中,我现在想使用 PDO::MYSQL_ATTR_INIT_COMMAND
参数,但它只支持一个查询.
In PDO I now want to use the PDO::MYSQL_ATTR_INIT_COMMAND
parameter, but it only supports one query.
是否需要SET CHARACTER SET utf8
?
推荐答案
在使用 SET NAMES utf8
之后使用 SET CHARACTER SET utf8
实际上会重置 character_set_connection
和 collation_connection
到@@character_set_database
和 @@collation_database
分别.
Using SET CHARACTER SET utf8
after using SET NAMES utf8
will actually reset the character_set_connection
and collation_connection
to@@character_set_database
and @@collation_database
respectively.
手册指出
SET NAMES x
等价于
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
和SET CHARACTER SET x
等价于
SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;
而 SET collation_connection = x
也在内部执行 SET character_set_connection = <<character_set_of_collation_x>>
和 SET character_set_connection = x
也在内部执行执行 SET collation_connection = <<default_collation_of_character_set_x
.
whereas SET collation_connection = x
also internally executes SET character_set_connection = <<character_set_of_collation_x>>
and SET character_set_connection = x
internally also executes SET collation_connection = <<default_collation_of_character_set_x
.
所以本质上,您将 character_set_connection
重置为 @@character_set_database
并将 collation_connection
重置为 @@collation_database
.手册解释了这些变量的用法:
So essentially you're resetting character_set_connection
to @@character_set_database
and collation_connection
to @@collation_database
. The manual explains the usage of these variables:
服务器应该使用什么字符集将语句翻译为 after收到了吗?
为此,服务器使用character_set_connection 和collation_connection 系统变量.它转换由发送的语句客户端从 character_set_client 到character_set_connection(除了字符串字面量介绍人,如 _latin1 或 _utf8).collation_connection 对文字字符串的比较.为了字符串与列的比较值,collation_connection 不很重要,因为列有自己的整理,具有更高的排序规则优先.
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
总而言之,MySQL 用来处理查询及其结果的编码/转码过程是一个多步骤的事情:
To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:
- MySQL 将传入查询视为在
character_set_client
中编码. - MySQL 将语句从
character_set_client
转码为character_set_connection
- 当比较字符串值与列值时,MySQL 将字符串值从
character_set_connection
转码为给定数据库列的字符集,并使用列排序规则进行排序和比较. - MySQL 构建以
character_set_results
编码的结果集(这包括结果数据以及结果元数据,例如列名等)
- MySQL treats the incoming query as being encoded in
character_set_client
. - MySQL transcodes the statement from
character_set_client
intocharacter_set_connection
- when comparing string values to column values MySQL transcodes the string value from
character_set_connection
into the character set of the given database column and uses the column collation to do sorting and comparison. - MySQL builds up the result set encoded in
character_set_results
(this includes result data as well as result metadata such as column names and so on)
因此,SET CHARACTER SET utf8
可能不足以提供完整的 UTF-8 支持.考虑 latin1
的默认数据库字符集和用 utf8
-charset 定义的列,并执行上述步骤.由于 latin1
无法覆盖 UTF-8 可以覆盖的所有字符,因此您可能会在步骤 3 中丢失字符信息.
So it could be the case that a SET CHARACTER SET utf8
would not be sufficient to provide full UTF-8 support. Think of a default database character set of latin1
and columns defined with utf8
-charset and go through the steps described above. As latin1
cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.
- 步骤3:鉴于您的查询以 UTF-8 编码并且包含无法用
latin1
表示的字符,这些字符将在从utf8
转码到latin1
(默认数据库字符集)时丢失,使您的查询失败.
- Step 3: Given that your query is encoded in UTF-8 and contains characters that cannot be represented with
latin1
, these characters will be lost on transcoding fromutf8
tolatin1
(the default database character set) making your query fail.
所以我认为可以肯定地说 SET NAMES ...
是处理字符集问题的正确方法.尽管我可能会补充说,正确设置 MySQL 服务器变量(所有必需的变量都可以在 my.cnf
中静态设置)可以将您从每次连接所需的额外查询的性能开销中解放出来.
So I think it's safe to say that SET NAMES ...
is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the required variables can be set statically in your my.cnf
) frees you from the performance overhead of the extra query required on every connect.
这篇关于是“SET CHARACTER SET utf8"吗?必要的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!