因此,我想获取此页面(nba团队)的所有图片。
http://www.cbssports.com/nba/draft/mock-draft
但是,我的代码提供了更多的功能。它给了我,
<a href="/nba/teams/page/ORL"><img src="http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png" alt="Orlando Magic" width="30" height="30" border="0" /></a>
我怎样才能缩短它,只给我,
http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png.
我的代码:
import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read())
rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]
for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[1].find("a")
if anchor:
print anchor
最佳答案
我知道这可能是“创伤性的”,但是对于那些自动生成的页面,您只想将这些令人讨厌的图像拿走,再也不会回来,因此,采用所需模式的快速n脏正则表达式通常是我的选择(没有Beautiful Soup依赖性是很大的优势):
import urllib, re
source = urllib.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()
## every image name is an abbreviation composed by capital letters, so...
for link in re.findall('http://sports.cbsimg.net/images/nba/logos/30x30/[A-Z]*.png', source):
print link
## the code above just prints the link;
## if you want to actually download, set the flag below to True
actually_download = False
if actually_download:
filename = link.split('/')[-1]
urllib.urlretrieve(link, filename)
希望这可以帮助!
关于python - 使用Python从网页中提取图像链接,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/11350464/