urllib与elementtree结合在一起

本文介绍了urllib与elementtree结合在一起的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在使用标准Python库中的ElementTree模块来解析简单的HTML时，我遇到了一些问题。这是我的源代码：

I'm having a few problems with parsing simple HTML with use of the ElementTree module out of the standard Python libraries. This is my source code:

from urllib.request import urlopen
from xml.etree.ElementTree import ElementTree

import sys

def main():
    site = urlopen("http://1gabba.in/genre/hardstyle")
    try:
        html = site.read().decode('utf-8')
        xml = ElementTree(html)
        print(xml)
        print(xml.findall("a"))
    except:
        print(sys.exc_info())

if __name__ == '__main__':
    main()

这一切都会失败，我在控制台上得到以下输出：

Either this fails, I get the following output on my console:

<xml.etree.ElementTree.ElementTree object at 0x00000000027D14E0>
(<class 'AttributeError'>, AttributeError("'str' object has no attribute 'findall'",), <traceback object at 0x0000000002910B88>)

因此，当我们查看中，我们将看到ElementTree类具有findall函数。额外的事情：xml.find（ a）可以正常工作，但是它返回一个int而不是Element实例。

So xml is indeed an ElementTree object, when we look at the documentation we'll see that the ElementTree class has a findall function. Extra thingie: xml.find("a") works fine, but it returns an int instead of an Element instance.

那么有人可以帮我吗？我误会了什么？

So could anybody help me out? What I am misunderstanding?

urllib与elementtree结合在一起

问题描述

推荐答案