问题描述
我有这个字符串Traor\u0102\u0160
Traor\u0102\u0160
应该产生Traoré
.然后解码的Traoré
utf-8应该产生Traorè
如何将其转换为Traorè
?
Traor\u0102\u0160
是哪种字符? Unicode? p>
我已经阅读了此 http://docs.python.org/howto/unicode.html#encodings 很多次.但是我还是很困惑.
我通过以下请求获得了该数据:
import json
import requests
# making a request to get this json
r = requests.get('http://cdn.content.easports.com/fifa/fltOnlineAssets/2013/fut/items/web/199074.json')
print r.json
解决方案
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import json
import requests
headers = {'Content-Type': 'application/json'}
r = requests.get('http://cdn.content.easports.com/fifa/fltOnlineAssets/2013/fut/items/web/199074.json', headers=headers)
print r.content
#prints
{"Item":{"FirstName":"Lacina","LastName":"Traoré","CommonName":null,"Height":"203","DateOfBirth":{"Year":"1990","Month":"8","Day":"20"},"PreferredFoot":"Left","ClubId":"100766","LeagueId":"67","NationId":"108","Rating":"78","Attribute1":"79","Attribute2":"71","Attribute3":"45","Attribute4":"69","Attribute5":"50","Attribute6":"72","Rare":"1","ItemType":"PlayerA"}}
基本上,我需要设置发送严格的标题.
谢谢大家
对我来说,您的网站返回了"Traor\u00e9"
(最后一个字符是é
):
r = requests.get(url)
print(json.dumps(json.loads(r.content)['Item']['LastName']))
# -> "Traor\u00e9" -> Traoré
r.json
(r.text
)在此处产生不正确的内容.服务器或requests
或两者都使用不正确的编码,从而导致"Traor\u0102\u0160"
. JSON文本的编码完全由其内容定义,因此始终可以从 json rfc :
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
在这种情况下,r.content
的开头没有零字节,因此json.loads
可以工作,否则,如果服务器在Content-Type
标头中发送了错误的字符编码或解决方法错误
I've this string Traor\u0102\u0160
Traor\u0102\u0160
Should produce Traoré
. Then Traoré
utf-8 decoded should produce Traorè
How I can convert it to Traorè
?
What kind of chars are Traor\u0102\u0160
? Unicode?
I've already read this http://docs.python.org/howto/unicode.html#encodings many times. But I'm still really confused.
I get this data with the following request:
import json
import requests
# making a request to get this json
r = requests.get('http://cdn.content.easports.com/fifa/fltOnlineAssets/2013/fut/items/web/199074.json')
print r.json
Solution
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import json
import requests
headers = {'Content-Type': 'application/json'}
r = requests.get('http://cdn.content.easports.com/fifa/fltOnlineAssets/2013/fut/items/web/199074.json', headers=headers)
print r.content
#prints
{"Item":{"FirstName":"Lacina","LastName":"Traoré","CommonName":null,"Height":"203","DateOfBirth":{"Year":"1990","Month":"8","Day":"20"},"PreferredFoot":"Left","ClubId":"100766","LeagueId":"67","NationId":"108","Rating":"78","Attribute1":"79","Attribute2":"71","Attribute3":"45","Attribute4":"69","Attribute5":"50","Attribute6":"72","Rare":"1","ItemType":"PlayerA"}}
Basically I needed to set to send the rigth headers.
Thank you all
For me your site returns "Traor\u00e9"
(the last character is é
):
r = requests.get(url)
print(json.dumps(json.loads(r.content)['Item']['LastName']))
# -> "Traor\u00e9" -> Traoré
r.json
(r.text
) produces incorrect content here. Either server or requests
or both use incorrect encoding that results in "Traor\u0102\u0160"
. The encoding of JSON text is completely defined by its content therefore it is always possible to decode it whatever headers server sends, from json rfc:
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
In this case there are no zero bytes at the start of r.content
so json.loads
works otherwise you need manually to convert it to a Unicode string if the server sends incorrect character encoding in Content-Type
header or to workaround requests
bug
这篇关于再次转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!