问题描述
对不起我的英语
我必须从json文件转换字符串,如下所示:
I have to convert strings from a json file like the following:
{"detalle":"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}
在类似的地方:
{"detalle":"el Expediente N° 30 de la Resolución 11..."}
然后将其写成txt
我尝试过:
json.dumps({"detalle":"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
返回
'{"detalle": "el Expediente N\\\\u00b0\\\\u00a030 de la Resoluci\\\\u00f3n 11..."}'
如何转换?
推荐答案
(在这个答案,我假设您使用的是Python 2。)
(In this answer, I'm assuming you use Python 2.)
首先,让我解释一下为什么您的代码段返回的内容与您的不同期望:
First, let me explain why your snippet returns something different than you expect:
r1 = json.dumps({"detalle":"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r1)
r2 = json.dumps({"detalle":u"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r2)
此输出:
{"detalle": "el Expediente N\\u00b0\\u00a030 de la Resoluci\\u00f3n 11..."}
{"detalle": "el Expediente N° 30 de la Resolución 11..."}
不同之处在于,在第一种情况下,输入字符串为ascii代码,带有斜杠和其他表示特殊字符的字符,在第二种情况下,字符串是带有Unicode字符的Unicode字符串。第二种情况是您想要的。
The difference is, that in the first case, the input string is ascii code, with slashes and other characters to represent special characters, and in the second case, the string is a unicode string with unicode characters. The second case is what you want.
基于此,我从您的问题中了解到以下信息:
Based on this, here is what I understand from your problem:
通常,当您使用 json
模块读取JSON文件时,解析器将不对字符串(在JSON文件中转义的字符串)进行转义。如果仍然看到转义字符,则表明字符串在JSON文件中(偶然地?)被两次转义了。在这种情况下,请尝试使用 s.decode('unicode-escape')
进行额外的unescape:
Normally when you read a JSON file with the json
module, the strings (which are escaped in the JSON file) are unescaped by the parser. If you still see escaped characters, that indicates that the strings were (accidentally?) double escaped in the JSON file. In that case, try an extra unescape with s.decode('unicode-escape')
:
data["detalle"] = data["detalle"].decode('unicode-escape')
在Python中加载正确的unicode字符串后,使用 s.encode('utf8')
将其转换为字节,然后将结果写入a文件,是正确的。
Once you have proper unicode strings loaded in Python, converting them to bytes with s.encode('utf8')
and writing the result to a file, is correct.
这篇关于用python转换转义字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!