所以可以说我正在尝试获取指向特定图像的链接,如下所示:
from bs4 import BeautfiulSoup
import urlparse
soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
srcd = urlparse.urlparse(src)
path = srcd.path # gets the path
fn = os.path.basename(path) # gets filename
# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />
有什么简单的方法可以获取完整的网址吗?还是我必须使用正则表达式?
最佳答案
使用urlparse.urljoin
:
>>> import urlparse
>>> base_url = "http://example.com/foo/"
>>> urlparse.urljoin(base_url, "../bar")
'http://example.com/bar'
>>> urlparse.urljoin(base_url, "/baz")
'http://example.com/baz'
关于python - BeautifulSoup当它具有../..时,如何从img src获取URL?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/13403691/