问题描述
我已经想出了这个问题,但经过一些测试,我决定创建一些更具体的相关信息了一个新问题:
I already came up with this problem, but after some testing I decided to create a new question with some more specific Infos:
我读用户帐户与我们的活动目录中的python-LDAP(和Python 2.7)。这样确实很好,但我有问题的特殊字符。他们看起来像UTF-8 EN codeD串在控制台上打印时。我们的目标是将它们写入到一个MySQL数据库,但我不明白这些字符串转换为正确的UTF-8从一开始。
I am reading user accounts with python-ldap (and Python 2.7) from our Active Directory. This does work well, but I have problems with special chars. They do look like UTF-8 encoded strings when printed on the console. The goal is to write them into a MySQL DB, but I don't get those strings into proper UTF-8 from the beginning.
例(fullentries是我的阵列的所有AD项):
Example (fullentries is my array with all the AD entries):
fullentries[23][1].decode('utf-8', 'ignore')
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])
第二个测试与手工插入一个字符串,如下所示:
A second test with a string inserted by hand as follows:
testentry = "M\xc3\xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)
第一个例子IST的输出:
The output of the first example ist:
M\xc3\xbcller
M\xc3\xbcller
u'M\\xc3\\xbcller'
编辑:如果我试图用.replace(\\\\','\\)输出保持不变取代双反斜线
If I try to replace the double backslashes with .replace('\\\\','\\) the output remains the same.
第二实施例的输出:
Müller
M�ller
'M\xc3\xbcller'
有没有办法让AD输出正确连接codeD?我已经看了很多文件,但它的所有规定的LDAPv3给你严格的UTF-8 EN codeD字符串。 Active Directory使用的LDAPv3。
Is there any way to get the AD output properly encoded? I already read a lot of documentation, but it all states that LDAPv3 gives you strictly UTF-8 encoded strings. Active Directory uses LDAPv3.
我的大问题,这个话题是在这里:写入UTF-8字符串到MySQL与Python
My older question this topic is here: Writing UTF-8 String to MySQL with Python
编辑:添加再版(S)的相关信息。
Added repr(s) infos
推荐答案
首先,要知道打印
荷兰国际集团在Windows控制台常常是garbles数据,因此对于步你的测试,你应该再版印刷(S)
看precise字节,你有你的字符串。
First, know that print
ing to a Windows console is often the step that garbles data, so for your tests, you should print repr(s)
to see the precise bytes you have in your string.
您需要了解如何从AD数据是EN codeD。此外,再版印刷(S)
将让你看到的数据的内容。
You need to find out how the data from AD is encoded. Again, print repr(s)
will let you see the content of the data.
更新:
确定,它看起来像你越来越陌生的字符串不知。有可能是一种方式,以更好地让他们,但你可以在任何情况下适应,虽然它不是pretty的:
OK, it looks like you're getting strange strings somehow. There might be a way to get them better, but you can adapt in any case, though it isn't pretty:
u.decode('unicode_escape').encode('iso8859-1').decode('utf8')
您可能想看看你是否能获得数据更加自然的格式。
You might want to look into whether you can get the data in a more natural format.
这篇关于通过Python-LDAP与UNI code工作EN codeD字符串从Active Directory的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!