7.使用Latin-1 按顺序从1到6,检查是否设法解码输入。请注意,在第5步中,您一定会获得成功的解码;如果你得到任何控制字符(来自范围(128,160)),则认为这是一个失败;然后再次尝试第7步latin-1 当你找到第一个正确解码的编码时,用ascii和xmlcharrefreplace编码它,你就不需要了再担心编码问题了。 问候, Martin This should work. If you somehow manage to guess the encoding, e.g. guess it as cp1252, then htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace") will give you a file that contains only ASCII characters, and character references for everything else. Now, how should you guess the encoding? Here is a strategy: 1. use the encoding that was sent through the HTTP header. Be absolutely certain to not ignore this encoding. 2. use the encoding in the XML declaration (if any). 3. use the encoding in the http-equiv meta element (if any) 4. use UTF-8 5. use Latin-1, and check that there are no characters in the range(128,160) 6. use cp1252 7. use Latin-1 In the order from 1 to 6, check whether you manage to decode the input. Notice that in step 5, you will definitely get successful decoding; consider this a failure if you have get any control characters (from range(128, 160)); then try in step 7 latin-1 again. When you find the first encoding that decodes correctly, encode it with ascii and xmlcharrefreplace, and you won''t need to worry about the encoding, anymore. Regards, Martin 我有类似的问题,有像? UA?ü?等等。我从网页中提取一些内容,然后他们提供了任何内容, 有时甚至不会在标题中提供任何编码信息。但是 你的解决方案听起来相当不错,我只是不知道是否 - 它适用于我提到的字符 - 你用什么编码到底有多少?b $ b - 你究竟是怎么做到这一切的?全部带有somestring.decode() 或...你能举一个例子来说明这7个步骤吗? 提前用于帮助 ChrisI have a similar problem, with characters like ??üA?ü? and so on. I amextracting some content out of webpages, and they deliver whatever,sometimes not even giving any encoding information in the header. Butyour solution sounds quite good, i just do not know if- it works with the characters i mentioned- what encoding do you have in the end- and how exactly are you doing all this? All with somestring.decode()or... Can you please give an example for these 7 steps?Thanx in advance for the helpChris Christian Ergh写道:Christian Ergh wrote: - 它适用于我提到的字符 确实如此。 - 你到底有什么编码 US-ASCII - 你究竟在做什么这些所有?全部用somestring.decode()或者......你能举一个这7个步骤的例子吗? - it works with the characters i mentionedIt does. - what encoding do you have in the endUS-ASCII - and how exactly are you doing all this? All with somestring.decode() or... Can you please give an example for these 7 steps? 我可以,但我不是''有时间 - 只是尝试拿出一些 代码,我试着评论它。 问候, MartinI could, but I don''t have the time - just try to come up with somecode, and I try to comment on it.Regards,Martin 这篇关于字符编码转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-02 10:54