问题描述
我正在测试使用 requests
模块来获取网页的内容.但是当我查看内容时,我发现它没有获得页面的全部内容.
I am testing using the requests
module to get the content of a webpage. But when I look at the content I see that it does not get the full content of the page.
这是我的代码:
import requests
from bs4 import BeautifulSoup
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
同样在 chrome 网络浏览器上,如果我查看页面源代码,我看不到完整内容.
Also on the chrome web-browser if I look at the page source I do not see the full content.
有没有办法获得我提供的示例页面的完整内容?
Is there a way to get the full content of the example page that I have provided?
推荐答案
页面使用 JavaScript 呈现,提出更多请求以获取额外数据.您可以使用 selenium 获取完整页面.
The page is rendered with JavaScript making more requests to fetch additional data. You can fetch the complete page with selenium.
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())
有关其他解决方案,请参阅我对 Google 财经 (BeautifulSoup) 的回答
For other solutions see my answer to Scraping Google Finance (BeautifulSoup)
这篇关于Python 3:使用请求无法获取网页的全部内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!