问题描述
你能不能帮我解决这个问题,好吗?
Could you help me with this problem, please?
(环境:Python 2.7版+ BeautifulSoup 4.3.2)
(Environment: Python 2.7 + BeautifulSoup 4.3.2)
我试图用Python和BeautifulSoup拿起网页上的信息。因为网页是在该公司的网站需要登录和重定向,所以我复制源$ C $目标页面的CS到一个文件并将其保存在Cexample.html的:\\执业方便
I am trying to using Python and BeautifulSoup to pick up information on a webpage. Because the webpage is in the company website requires login and redirection, so I copy the source codes of the target page into a file and save it as "example.html" in C:\ for the convenience of practicing.
这在原来的codeS的一部分:
This the a part of the original codes:
<tr class="ghj">
<td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&u=12563">port_new_cape</a></td>
<td class="position"><a href="./search.php?id=12563&sr=positions" title="Search positions">452</a></td>
<td class="details"><div>South</div></td>
<td>May 09, 1997</td>
<td>Jan 23, 2009 12:05 pm </td>
</tr>
在codeS到目前为止,我摸索出的是:
The codes so far I worked out is:
from bs4 import BeautifulSoup
import re
import urllib2
url = "C:\example.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
cities = soup.find_all('span', {'class' : 'city-sh'})
for city in cities:
print city
**这仅仅是检测的第一阶段,以便有所尚未完成
** this is just the first stage of testing so somewhat not completed.
然而,当我运行它,它提供了错误信息,似乎是不恰当的使用urllib2.urlopen打开本地文件。
However when I run it, it gives error message, seems it’s improper to use "urllib2.urlopen" to open a local file.
回溯(最近通话最后一个):
文件C:\\ Python27 \\ Testing.py,8号线,在
页= urllib2.urlopen(URL)
文件C:\\ Python27 \\ lib目录\\ urllib2.py,第127行中的urlopen
返回_opener.open(URL,数据,超时)
文件C:\\ Python27 \\ lib目录\\ urllib2.py,404线,开放
响应= self._open(REQ,数据)
文件C:\\ Python27 \\ lib目录\\ urllib2.py,线路427,在_open
unknown_open',REQ)
文件C:\\ Python27 \\ lib目录\\ urllib2.py,382线,在_call_chain
结果= FUNC(*参数)
文件C:\\ Python27 \\ lib目录\\ urllib2.py,线路1247,在unknown_open
提高URLError('未知的URL类型:%s'的%型)
URLError:
Traceback (most recent call last): File "C:\Python27\Testing.py", line 8, in page = urllib2.urlopen(url) File "C:\Python27\lib\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 404, in open response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 427, in _open 'unknown_open', req) File "C:\Python27\lib\urllib2.py", line 382, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 1247, in unknown_open raise URLError('unknown url type: %s' % type) URLError:
所以,请你教我,用什么方式,我可以通过使用本地文件的做法?谢谢你。
So could you please teach me, in what way, I can practice by using a local file? Thank you.
推荐答案
,这个问题就解决了。本证到他家里去。 :)
with Chandan's help, the problem is solved. credit shall go to him. :)
在urllib2.url是无用的在这里。
the "urllib2.url" is useless here.
from bs4 import BeautifulSoup
import re
import urllib2
url = r"C:\example.html"
page = open(url)
soup = BeautifulSoup(page.read())
cities = soup.find_all('span', {'class' : 'city-sh'})
for city in cities:
print city
这篇关于使用Python和BeautifulSoup(保存的网页源$ C $ CS为本地文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!