本文介绍了Python机械化JavaScript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用机械化从此网站上获取纽约北地铁的价格:
http://as0.mta.info/mnr/fares/choosestation.cfm

I'm trying to use mechanize to grab prices for New York's metro-north railroad from this site:
http://as0.mta.info/mnr/fares/choosestation.cfm

问题在于,当您选择第一个选项时,站点将使用javascript填充可能的目的地列表.我已经用python编写了等效的代码,但似乎无法正常运行.这是我到目前为止的内容:

The problem is that when you select the first option, the site uses javascript to populate your list of possible destinations. I have written equivalent code in python, but I can't seem to get it all working. Here's what I have so far:

import mechanize
import cookielib
from bs4 import BeautifulSoup

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)     Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.open("http://as0.mta.info/mnr/fares/choosestation.cfm")

br.select_form(name="form1")
br.form.set_all_readonly(False)

origin_control = br.form.find_control("orig_stat", type="select")
origin_control_list = origin_control.items
origin_control.value = [origin_control.items[0].name]

destination_control_list = reFillList(0, origin_control_list)

destination_control = br.form.find_control("dest_stat", type="select")
destination_control.items = destination_control_list
destination_control.value = [destination_control.items[0].name]

response = br.submit()
response_text = response.read()
print response_text

我知道我没有给您reFillList()方法的代码,因为它很长,但是假设它正确地创建了一个mechanize.option对象列表. Python并没有抱怨我任何事情,但是在提交时,我得到了此警报的html:

I know I didn't give you code for the reFillList() method, because it's long, but assume it correctly creates a list of mechanize.option objects. Python doesn't complain about me about anything, but on submit I get the html for this alert:

无法在线获得两条线路之间的票价信息.请致电511与我们的客户信息中心联系,要求与代表联系以获取更多信息."

"Fare information for travel between two lines is not available on-line. Please contact our Customer Information Center at 511 and ask to speak to a representative for further information."

我在这里错过了什么吗?感谢您的所有帮助!

Am I missing something here? Thanks for all the help!

推荐答案

如果您知道工作站ID,则可以更轻松地自己发布请求:

If you know the station IDs, it is easier to POST the request yourself:

import mechanize
import urllib

post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm'

orig = 295 #BEACON FALLS
dest = 292 #ANSONIA

params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig })
rq = mechanize.Request(post_url, params)

fares_page = mechanize.urlopen(rq)

print fares_page.read()

如果您具有用于查找给定起始ID(即refillList()的变体)的目标ID列表的代码,则可以针对每种组合运行此请求:

If you have the code to find the list of destination IDs for a given starting ID (i.e. a variant of refillList()), you can then run this request for each combination:

import mechanize
import urllib, urllib2
from bs4 import BeautifulSoup

url = 'http://as0.mta.info/mnr/fares/choosestation.cfm'
post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm'

def get_fares(orig, dest):
    params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig })
    rq = mechanize.Request(post_url, params)

    fares_page = mechanize.urlopen(rq)
    print(fares_page.read())

pool = BeautifulSoup(urllib2.urlopen(url).read())

#let's keep our stations organised
stations = {}

# dict by station id
for option in pool.find('select', {'name':'orig_stat'}).findChildren():
    stations[option['value']] = {'name':option.string}

#iterate over all routes
for origin in stations:
    destinations = get_list_of_dests(origin) #use your code for this
    stations[origin]['dests'] = destinations

    for destination in destinations:
        print('Processing from %s to %s' % (origin, destination))
        get_fares(origin, destination)

这篇关于Python机械化JavaScript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 03:48