python - 使用BeautifulSoup时出错

我想从网站中提取一些数据。我将其另存为“网页，仅HTML”，位于台式机上名为Soccerway.html的文件中。

之后，我使用IPython笔记本编写了以下命令：

from bs4 import BeautifulSoup
soup=BeautifulSoup(open("soccerway.html"))

我收到以下错误：

IOError: [Errno 2] No such file or directory: 'soccerway.html'

我该如何解决？

最佳答案

您无需手动保存页面。使用urllib2获取所需的html源：

from bs4 import BeautifulSoup
from urllib2 import urlopen

soup = BeautifulSoup(urlopen("http://my_site.com/mypage"))

例：

>>> from bs4 import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://google.com'))
>>> soup('a')
[<a class="gb1" href="http://www.google.com/imghp?hl=en&amp;tab=wi">Images</a>,
 ...
]

关于python - 使用BeautifulSoup时出错，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/22792271/