问题描述
我是网络爬虫游戏的新手.我正在尝试抓取以下网站: http://www.foodemissions.com/foodemissions/Calculator.aspx
I am new the web-scraping game. I am trying to scrap the following website:http://www.foodemissions.com/foodemissions/Calculator.aspx
使用Internet上的资源,我将以下HTTP POST请求放在一起:
Using resources found on the Internet, I put together the following HTTP POST request:
import urllib
from bs4 import BeautifulSoup
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
class MyOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://www.foodemissions.com/foodemissions/Calculator.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup_dummy = BeautifulSoup(f,"html5lib")
# parse and retrieve two vital form values
viewstate = soup_dummy.select("#__VIEWSTATE")[0]['value']
viewstategen = soup_dummy.select("#__VIEWSTATEGENERATOR")[0]['value']
soup_dummy.find(id="ctl00_MainContent_category")
#search for the string 'input' to find the form data
formData = (
('__VIEWSTATE', viewstate),
('__VIEWSTATEGENERATOR', viewstategen),
('ctl00$MainContent$transport', '200'),
('ctl00$MainContent$quantity','1'),
('ctl00$MainContent$wastepct','100')
)
encodedFields = urllib.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
soup = BeautifulSoup(f,"html5lib")
trans_emissions = soup.find("span", id="ctl00_MainContent_transEmissions")
print(trans_emissions.text)
即使更改ctl00$MainContent$transport
元素,最终打印命令的输出似乎也没有改变.为何会出现这种情况?
The output from my final print command doesn't seem to change even when I change the ctl00$MainContent$transport
element. Any pointers on why this is the case?
谢谢!
推荐答案
您需要通过将按钮名称添加到__EVENTTARGET
隐藏的输入中来使ASP.NET App认为您单击了计算按钮. >
You need to make the ASP.NET App "think" that you clicked the calculate button by adding the button name to the __EVENTTARGET
hidden input.
formData = (
('__VIEWSTATE', viewstate),
('__VIEWSTATEGENERATOR', viewstategen),
('ctl00$MainContent$transport', '100'),
('ctl00$MainContent$quantity','150'),
('ctl00$MainContent$wastepct','200'),
('__EVENTTARGET', 'ctl00$MainContent$calculate')
)
这篇关于用python刮.aspx页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!