python -\\u00c3\\u00a9在哪个世界变成é？

我有一个来自我无法控制的来源的json文件，其编码可能不正确，其中包含以下字符串:

d\u00c3\u00a9cor

business\u00e2\u20ac\u2122 active accounts

the \u00e2\u20ac\u0153Made in the USA\u00e2\u20ac\u009d label

由此，我正在收集他们打算将\u00c3\u00a9转换为beoom é的方法，即utf-8 hex C3 A9。这是有道理的。对于其他情况，我假设我们正在处理某些类型的定向引号。

我的理论是，这要么是使用以前从未遇到过的某种编码，要么是以某种方式对其进行了双重编码。我很好地编写了一些代码，将他们 splinter 的输入转换成我可以理解的东西，因为如果我引起他们的注意，他们不太可能能够修复该系统。

有什么主意如何使他们的投入变为我能理解的东西吗？作为记录，我正在使用Python。

最佳答案

您应该尝试ftfy模块:

>>> print ftfy.ftfy(u"d\u00c3\u00a9cor")
décor
>>> print ftfy.ftfy(u"business\u00e2\u20ac\u2122 active accounts")
business' active accounts
>>> print ftfy.ftfy(u"the \u00e2\u20ac\u0153Made in the USA\u00e2\u20ac\u009d label")
the "Made in the USA" label
>>> print ftfy.ftfy(u"the \u00e2\u20ac\u0153Made in the USA\u00e2\u20ac\u009d label", uncurl_quotes=False)
the “Made in the USA” label

关于python -\\u00c3\\u00a9在哪个世界变成é？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/26614323/