本文介绍了Python Scraper无法抓取img src的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法从网站www.kissmanga.com抓取图像.我正在使用Python3以及Requests和Beautifulsoup库.刮过的图片标签会给出空白的"src".

I'm unable to scrape images from the website www.kissmanga.com . I'm using Python3 and the Requests and Beautifulsoup libraries. The scraped image tags give blank "src".

SRC:

from bs4 import BeautifulSoup
import requests

scraper = cfscrape.create_scraper()

url = "http://kissmanga.com/Manga/Bleach/Bleach-634--Friend-004?id=235206"

response = requests.get(url)

soup2 = BeautifulSoup(response.text, 'html.parser')

divImage = soup2.find('div',{"id": "divImage"})

for img in divImage.findAll('img'):
     print(img)

response.close()

我认为可以防止刮擦图像,因为我相信该网站使用cloudflare.基于这种假设,我还尝试使用"cfscrape"库来抓取内容.

I think image scraping is prevented because I believe the website uses cloudflare. Upon this assumption, I also tried using the "cfscrape" library to scrape the content.

推荐答案

您需要等待JavaScript为图像注入html代码.

You need to wait for JavaScript to inject the html code for images.

多种工具能够做到这一点,其中一些是这样的:

Multiple tools are capable of doing this, here are some of them:

  • Ghost
  • PhantomJS (Ghost Driver)
  • Selenium

我能够使其与Selenium一起使用:

I was able to get it working with Selenium:

from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

driver = webdriver.Firefox()
# it takes forever to load the page, therefore we are setting a threshold
driver.set_page_load_timeout(5)

try:
    driver.get("http://kissmanga.com/Manga/Bleach/Bleach-634--Friend-004?id=235206")
except TimeoutException:
    # never ignore exceptions silently in real world code
    pass

soup2 = BeautifulSoup(driver.page_source, 'html.parser')
divImage = soup2.find('div', {"id": "divImage"})

# close the browser
driver.close()

for img in divImage.findAll('img'):
    print img.get('src')

如果您也想参考如何使用请求下载图像下载这些图像.

Refer to How to download image using requests if you also want to download these images.

这篇关于Python Scraper无法抓取img src的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:16