本文介绍了python 2.7编码解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个涉及编码/解码的问题。
我从文件中读取文本并与数据库中的文本(Postgres)进行比较
在两个列表中进行比较

I have a problem involving encoding/decoding.I read text from file and compare it with text from database (Postgres)Compare is done within two lists

\x9a表示još,从数据库中获取jo\xc5 \xa1表示相同的值

from file i get "jo\x9a" for "još" and from database I get "jo\xc5\xa1" for same value

common = [a for a in codes_from_file if a in kode_prfoksov]

# Items in one but not the other
only1 = [a for a in codes_from_file if not a in kode_prfoksov]

#Items only in another
only2 = [a for a in kode_prfoksov if not a in codes_from_file ]

如何解决这个问题?在比较这两个字符串以解决问题时,应该设置哪个编码?

How to solve this? Which encoding should be set when comparing this two strings to solve the issue?

谢谢

推荐答案

您的文件字符串似乎是Windows-1250编码。您的数据库似乎包含UTF-8字符串。

Your file strings seems to be Windows-1250 encoded. Your database seems to contain UTF-8 strings.

因此,您可以先将所有字符串转换为unicode:

So you can either convert first all strings to unicode:

codes_from_file = [a.decode("windows-1250") for a in codes_from_file]
kode_prfoksov]  = [a.decode("utf-8") for a in codes_from_file]

或者如果您不想要Unicode字符串,只需将文件字符串转换为UTF- / p>

or if you do not want unicode strings, just convert the file string to UTF-8:

codes_from_file = [a.decode("windows-1250").encode("utf-8") for a in codes_from_file]

这篇关于python 2.7编码解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 13:38