使用Python和BeautifulSoup（保存的网页源$ C $ CS为本地文件）

本文介绍了使用Python和BeautifulSoup（保存的网页源$ C $ CS为本地文件）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你能不能帮我解决这个问题，好吗？

Could you help me with this problem, please?

（环境：Python 2.7版+ BeautifulSoup 4.3.2）

(Environment: Python 2.7 + BeautifulSoup 4.3.2)

我试图用Python和BeautifulSoup拿起网页上的信息。因为网页是在该公司的网站需要登录和重定向，所以我复制源$ C $目标页面的CS到一个文件并将其保存在Cexample.html的：\\执业方便

I am trying to using Python and BeautifulSoup to pick up information on a webpage. Because the webpage is in the company website requires login and redirection, so I copy the source codes of the target page into a file and save it as "example.html" in C:\ for the convenience of practicing.

这在原来的codeS的一部分：

This the a part of the original codes:

<tr class="ghj">
    <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>
    <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>
    <td class="details"><div>South</div></td>
    <td>May 09, 1997</td>
    <td>Jan 23, 2009 12:05 pm&nbsp;</td>
</tr>

在codeS到目前为止，我摸索出的是：

The codes so far I worked out is:

from bs4 import BeautifulSoup
import re
import urllib2

url = "C:\example.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

cities = soup.find_all('span', {'class' : 'city-sh'})

for city in cities:
print city

**这仅仅是检测的第一阶段，以便有所尚未完成

** this is just the first stage of testing so somewhat not completed.

然而，当我运行它，它提供了错误信息，似乎是不恰当的使用urllib2.urlopen打开本地文件。

However when I run it, it gives error message, seems it’s improper to use "urllib2.urlopen" to open a local file.

回溯（最近通话最后一个）：
   文件C：\\ Python27 \\ Testing.py，8号线，在
     页= urllib2.urlopen（URL）
   文件C：\\ Python27 \\ lib目录\\ urllib2.py，第127行中的urlopen
     返回_opener.open（URL，数据，超时）
   文件C：\\ Python27 \\ lib目录\\ urllib2.py，404线，开放
     响应= self._open（REQ，数据）
   文件C：\\ Python27 \\ lib目录\\ urllib2.py，线路427，在_open
     unknown_open'，REQ）
   文件C：\\ Python27 \\ lib目录\\ urllib2.py，382线，在_call_chain
     结果= FUNC（*参数）
   文件C：\\ Python27 \\ lib目录\\ urllib2.py，线路1247，在unknown_open
     提高URLError（'未知的URL类型：％s'的％型）
URLError：

Traceback (most recent call last): File "C:\Python27\Testing.py", line 8, in page = urllib2.urlopen(url) File "C:\Python27\lib\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 404, in open response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 427, in _open 'unknown_open', req) File "C:\Python27\lib\urllib2.py", line 382, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 1247, in unknown_open raise URLError('unknown url type: %s' % type) URLError:

所以，请你教我，用什么方式，我可以通过使用本地文件的做法？谢谢你。

So could you please teach me, in what way, I can practice by using a local file? Thank you.

urllib2

使用Python和BeautifulSoup（保存的网页源$ C ​​$ CS为本地文件）

问题描述

推荐答案

使用Python和BeautifulSoup（保存的网页源$ C $ CS为本地文件）