问题描述
我对charset和 encoding 在 SQLAlchemy 中工作.我了解(并已阅读)字符集和编码之间的区别,而且我对编码历史.
I am very confused with the way charset and encoding work in SQLAlchemy. I understand (and have read) the difference between charsets and encodings, and I have a good picture of the history of encodings.
我在latin1_swedish_ci中的MySQL中有一个表(为什么?可能是由于).我需要创建一个熊猫数据框,在其中我得到正确的字符(而不是奇怪的符号).最初,这是在代码中:
I have a table in MySQL in latin1_swedish_ci (Why? Possible because of this). I need to create a pandas dataframe in which I get the proper characters (and not weird symbols). Initially, this was in the code:
connect_engine = create_engine('mysql://user:[email protected]/db')
sql_query = "select * from table1"
df = pandas.read_sql(sql_query, connect_engine)
我们开始遇到Š
字符的麻烦(对应于u'\u0160'
unicode,但是却得到了'\ x8a').我希望它能起作用:
We started having troubles with the Š
character (corresponding to the u'\u0160'
unicode, but instead we get '\x8a'). I expected this to work:
connect_engine = create_engine('mysql://user:[email protected]/db', encoding='utf8')
但是,我继续得到'\x8a'
,我意识到,鉴于编码参数的默认值为utf8
,这是有意义的.因此,然后,我尝试encoding='latin1'
解决该问题:
but, I continue getting '\x8a'
, which, I realized, makes sense given that the default of the encoding parameter is utf8
. So, then, I tried encoding='latin1'
to tackle the problem:
connect_engine = create_engine('mysql://user:[email protected]/db', encoding='latin1')
但是,我仍然得到相同的'\ x8a'.需要明确的是,在两种情况下(encoding='utf8'
和encoding='latin1'
),我都可以执行mystring.decode('latin1')
,但不能执行mystring.decode('utf8')
.
but, I still get the same '\x8a'. To be clear, in both cases (encoding='utf8'
and encoding='latin1'
), I can do mystring.decode('latin1')
but not mystring.decode('utf8')
.
然后,我在连接字符串(即'mysql://user:[email protected]/db?charset=latin1'
)中重新发现了charset
参数.在尝试了所有可能的字符集和编码组合之后,我发现这一工作有效:
And then, I rediscovered the charset
parameter in the connection string, i.e. 'mysql://user:[email protected]/db?charset=latin1'
. And after trying all possible combinations of charset and encoding, I found that this one work:
connect_engine = create_engine('mysql://user:[email protected]/db?charset=utf8')
如果有人可以向我解释如何正确使用连接字符串中的charset
和create_engine
中的encoding
参数,我将不胜感激?
I would appreciate if somebody can explain me how to correctly use the charset
in the connection string, and the encoding
parameter in the create_engine
?
推荐答案
encoding
是用于在SQLAlchemy 中对进行编码/解码的编解码器.从文档中:
encoding
is the codec used for encoding/decoding within SQLAlchemy. From the documentation:
[...]
要正确配置系统以容纳Python unicode
对象, DBAPI应该配置为最大程度地处理unicode 适当的学位[...]
To properly configure a system to accommodate Python unicode
objects, the DBAPI should be configured to handle unicode to the greatest degree as is appropriate [...]
mysql-python直接处理unicode,因此无需使用此设置.
mysql-python handles unicode directly, so there's no need to use this setting.
charset
是特定于mysql-python驱动程序的设置.从文档:
charset
is a setting specific to the mysql-python driver. From the documentation:
此设置控制服务器上的三个变量 ,特别是 character_set_results
,是您感兴趣的内容.设置后,字符串将作为unicode
对象返回.
This setting controls three variables on the server, specifically character_set_results
, which is what you are interested in. When set, strings are returned as unicode
objects.
请注意,这仅在数据库中具有latin1编码的数据时适用.如果您将utf-8字节存储为latin1,则最好使用encoding
代替.
Note that this applies only if you have latin1 encoded data in the database. If you've stored utf-8 bytes as latin1, you may have better luck using encoding
instead.
这篇关于如何在SQLAlchemy的`create_engine`中使用`charset`和`encoding`(创建 pandas 数据框)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!