本文介绍了使用Python lxml时出现错误“无法加载外部实体"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析从Web检索到的XML文档,但是在解析此错误后崩溃:

I'm trying to parse an XML document I retrieve from the web, but it crashes after parsing with this error:

': failed to load external entity "<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>

这是下载的XML中的第二行.有没有一种方法可以防止解析器尝试加载外部实体,或者通过另一种方法来解决此问题?这是我到目前为止的代码:

That is the second line in the XML that is downloaded. Is there a way to prevent the parser from trying to load the external entity, or another way to solve this? This is the code I have so far:

import urllib2
import lxml.etree as etree

file = urllib2.urlopen("http://www.greenbuttondata.org/data/15MinLP_15Days.xml")
data = file.read()
file.close()

tree = etree.parse(data)

推荐答案

与mzjn所说的一致,如果您确实希望将字符串传递给etree.parse(),只需将其包装在StringIO对象中即可.

In concert with what mzjn said, if you do want to pass a string to etree.parse(), just wrap it in a StringIO object.

示例:

from lxml import etree
from StringIO import StringIO

myString = "<html><p>blah blah blah</p></html>"

tree = etree.parse(StringIO(myString))

lxml文档中使用了此方法.

这篇关于使用Python lxml时出现错误“无法加载外部实体"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 14:24