问题描述
我正在为使用Mac(如果相关)的一位同事设置Rails环境.我已经从实时mysql数据库中提取了数据,并使用该数据创建了本地开发数据库.如果我打开mysql控制台,并查看其名称字段中具有扩展字符集字符的记录的数据,则它看起来很好.但是,在Rails控制台(以及Rails生成的网页)中,编码被破坏了:例如,将Endash替换为â€".
I'm setting a rails environment up for one of my colleagues, who's using a mac (in case that's relevant). I've pulled the data down from our live mysql database and made a local development database with that data. If i open the mysql console, and look at the data for a record which has extended charset characters in its name field, then it looks fine. However, in the rails console (and in a rails-generated web page) the encoding is broken: an endash is replaced by "â€"" for example.
我所知道的唯一与此相关的rails config选项位于config/database.yml中.我目前有这个设置:
The only rails config options i know about that are relevant to this is in config/database.yml. I currently have this set:
encoding: utf8
collation: utf8_general_ci
例如使它在我的机器上可以正常工作.但是就像我说的那样,它不能在我同事的机器上工作.有任何想法吗?
which makes it work fine on my machine for example. But like i say it's not working on my colleague's machine. Any ideas anyone?
在实时服务器上,我从中复制了数据,字符集信息如下所示:
EDIT 1: on the live server, where i copied the data FROM, the charset info looks like this:
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
为了响应@eggyal的评论,我已经做了几个mysqldumps,这已经非常有启发性了.这是第一个转储:
EDIT 2: in response to @eggyal's comment i've done a couple of mysqldumps, which has been quite revealing. Here's the first dump:
$ mysqldump -u root -h127.0.0.1 dbname lessons --where="id=79510"
-- MySQL dump 10.11
--
-- Host: 127.0.0.1 Database: e_learning_resource_v3
-- ------------------------------------------------------
-- Server version 5.0.32-Debian_7etch4-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
--
-- Table structure for table `lessons`
--
DROP TABLE IF EXISTS `lessons`;
CREATE TABLE `lessons` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`description` text,
`user_id` int(11) default NULL,
`created_at` datetime default NULL,
`privacy` int(11) default '1',
`is_official` tinyint(1) default '0',
`is_readonly` tinyint(1) default NULL,
`comments_allowed` tinyint(1) default NULL,
`hours` int(11) default NULL,
`sessions` int(11) default NULL,
`updated_at` datetime default NULL,
`custom_menu_swf` varchar(255) default NULL,
`pupil_liked_at` datetime default NULL,
`user_liked_at` datetime default NULL,
`pupil_favorite_count` int(11) default '0',
`user_favorite_count` int(11) default '0',
`teacher_notes` text,
`pupil_notes` text,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `lessons`
--
-- WHERE: id=79510
LOCK TABLES `lessons` WRITE;
/*!40000 ALTER TABLE `lessons` DISABLE KEYS */;
INSERT INTO `lessons` VALUES (79510,'Jazzâ€"Man',NULL,NULL,'2014-04-03 12:08:05',1,0,NULL,NULL,NULL,NULL,'2014-04-03 12:08:05',NULL,NULL,NULL,0,0,NULL,NULL);
/*!40000 ALTER TABLE `lessons` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
-- Dump completed on 2014-04-03 11:16:42
因此,这只是一个简单的mysqldump,并且在"INSERT INTO Lessons"行中有破损的字符(爵士).
So, this was just a straight mysqldump and it's got the broken character in it (Jazzâ€"Man) in the "INSERT INTO lessons" line.
我使用一些其他选项再次进行了操作,转储文件中的数据看起来正常:
I do it again with some extra options, and the data looks ok in the dump file:
$ mysqldump -u root -h127.0.0.1 dbname lessons --extended-insert --single-transaction --default-character-set=latin1 --skip-set-charset --where="id=79510"
-- MySQL dump 10.11
--
-- Host: 127.0.0.1 Database: e_learning_resource_v3
-- ------------------------------------------------------
-- Server version 5.0.32-Debian_7etch4-log
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
--
-- Table structure for table `lessons`
--
DROP TABLE IF EXISTS `lessons`;
CREATE TABLE `lessons` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`description` text,
`user_id` int(11) default NULL,
`created_at` datetime default NULL,
`privacy` int(11) default '1',
`is_official` tinyint(1) default '0',
`is_readonly` tinyint(1) default NULL,
`comments_allowed` tinyint(1) default NULL,
`hours` int(11) default NULL,
`sessions` int(11) default NULL,
`updated_at` datetime default NULL,
`custom_menu_swf` varchar(255) default NULL,
`pupil_liked_at` datetime default NULL,
`user_liked_at` datetime default NULL,
`pupil_favorite_count` int(11) default '0',
`user_favorite_count` int(11) default '0',
`teacher_notes` text,
`pupil_notes` text,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `lessons`
--
-- WHERE: id=79510
LOCK TABLES `lessons` WRITE;
/*!40000 ALTER TABLE `lessons` DISABLE KEYS */;
INSERT INTO `lessons` VALUES (79510,'Jazz–Man',NULL,NULL,'2014-04-03 12:08:05',1,0,NULL,NULL,NULL,NULL,'2014-04-03 12:08:05',NULL,NULL,NULL,0,0,NULL,NULL);
/*!40000 ALTER TABLE `lessons` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
-- Dump completed on 2014-04-03 11:18:20
所以,看起来好像额外的选项可以解决问题:
So, it looks like the extra options did the trick:
--extended-insert --single-transaction --default-character-set=latin1 --skip-set-charset
推荐答案
当MySQL客户端与服务器交互时:
When a MySQL client interacts with the server:
-
服务器仅以字节字符串的形式接收任何文本;客户以前会告诉它如何编码此类文本.
the server receives any text merely as a string of bytes; the client will have previously told it how such text would be encoded.
如果服务器随后必须将该文本存储在表中,则它必须将其代码转换为相关列的编码(如果不同).
if the server then has to store that text in a table, it must transcode it to the encoding of the relevant column (if different).
如果客户端随后要检索此类文本,则服务器必须将其转码为客户端期望的编码.
if the client subsequently wants to retrieve such text, the server must transcode it to the encoding expected by the client.
如果客户端在第1步和第3步中使用的编码是 same (通常是这种情况,特别是当两种情况下的客户端是同一应用程序时),那么通常就不会引起注意.如果客户端使用的编码不是它说的那种编码.例如,假设客户端告诉MySQL它将使用latin1
,但实际上是在utf8
中发送数据:
If the encodings used by the client in steps 1 and 3 are the same (which is usually the case, especially when the client in both cases is the same application), then it often goes unnoticed if the client is using an encoding other than the one it said it would. For example, suppose the client tells MySQL that it will use latin1
, but actually sends data in utf8
:
-
字符串
'Jazz–Man'
以0x4a617a7ae280934d616e
的形式发送到UTF-8服务器.
The string
'Jazz–Man'
is sent to the server in UTF-8 as0x4a617a7ae280934d616e
.
MySQL在Windows-1252中将这些字节解码,可以理解为代表字符串'Jazzâ€Man'
.
MySQL, decoding those bytes in Windows-1252, understands them to represent the string 'Jazzâ€"Man'
.
要存储在utf8
列中,MySQL会将字符串转码成其UTF-8编码0x4a617a7ac3a2e282ace2809c4d616e
.可以使用SELECT HEX(name) FROM lessons WHERE id=79510
验证.
To store in a utf8
column, MySQL transcodes the string to its UTF-8 encoding 0x4a617a7ac3a2e282ace2809c4d616e
. This can be verified by using SELECT HEX(name) FROM lessons WHERE id=79510
.
当客户端检索到值时,MySQL认为它在latin1
中需要它,因此将其转码为Windows-1252编码的0x4a617a7ae280934d616e
.
When the client retrieves the value, MySQL thinks that it wants it in latin1
and so transcodes to the Windows-1252 encoding 0x4a617a7ae280934d616e
.
当客户端收到这些字节时,它将它们解码为UTF-8,因此将字符串理解为'Jazz–Man'
.
When the client receives those bytes, it decodes them as UTF-8 and therefore understands the string to be 'Jazz–Man'
.
结论:客户端没有意识到任何错误.仅当其他客户端(不会将其UTF-8连接错误声明为latin1
的客户端)尝试使用该表时,才会检测到问题.在您的情况下,这是在mysqldump获得数据导出时发生的.使用--default-character-set=latin1 --skip-set-charset
选项有效地迫使mysqldump的行为与您的应用程序相同,因此它最终得到了正确编码的数据.
Conclusion: the client doesn't realise anything is wrong. Problems are only detected when a different client (one that does not misstate its UTF-8 connection as latin1
) tries to use the table. In your case, this occurred when mysqldump obtained an export of the data; using the --default-character-set=latin1 --skip-set-charset
options effectively forced mysqldump to behave in the same broken way as your application, so it ended up with correctly encoded data.
要解决您的问题,您必须:
To fix your issue going forward, you must:
-
配置应用程序,使其正确设置其MySQL连接字符集(例如,在
config/database.yml
中为Rails设置encoding: utf8
);
Configure your application so that it correctly sets its MySQL connection character set (e.g. set
encoding: utf8
inconfig/database.yml
for Rails);
重新编码数据库中的数据,例如UPDATE lessons SET name = BINARY CONVERT(name USING latin1)
(请注意,必须对每个错误编码的文本列执行此操作).
Recode the data in your database, e.g. UPDATE lessons SET name = BINARY CONVERT(name USING latin1)
(note that this must be done for every misencoded text column).
还请注意,您可能需要自动执行这两个操作,这可能需要一些思考.
Also note that you will probably want to perform these two actions atomically, which may require some thought.
这篇关于utf8数据在mysql中看起来很好,但在rails中已损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!