问题描述
我从Merriam-Webster的API在本地保存xml页面,让我给您提供以下网址: http://www .dictionaryapi.com/api/v1/references/collegiate/xml/apple?key = bf534d02-bf4e-49bc-b43f-37f68a0bf4fd
I save the xml page locally from an API of Merriam-Webster, let me give you the url:http://www.dictionaryapi.com/api/v1/references/collegiate/xml/apple?key=bf534d02-bf4e-49bc-b43f-37f68a0bf4fd
那是一个例子.我从网址中进行网址检索并将其另存为xml文件.
That was an example.I urlretrieve it from the url and save it as a xml file.
现在我想打开它,但出现UnicodeDecodeError
.
Now I want to open it but a UnicodeDecodeError
occurs.
我做到了:
page = open('test.xml')
bs = BeautifulSoup(page)
然后发生以下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb
我试图将网址u'test.xml'
设置为无效.
I tried to make the url u'test.xml'
it didn't work.
编码配置已经是utf-8,仍然无法解决问题,仍然感谢您的建议.
The encoding configuration is already utf-8, which doesn't solve the problem, thanks for the advice anyway.
推荐答案
您需要将编码指定为utf-8,即数据编码的方式,文件名与内部内容无关,因此以u为前缀制作unicode字符串将无济于事:
You need to specify the encoding as utf-8 which is what the data is encoded as, the filename has nothing to do with what is inside so prefixing with u to make a unicode string is not going to help:
import io
with io.open('test.xml', encoding="utf-8") as page:
bs = BeautifulSoup(page)
这篇关于'ascii'编解码器在执行bs时无法解码字节0xcb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!