问题描述
我正在研究一个网络抓取项目,该项目将从旅行网站上抓取票务信息.
I am working on a web scraping project which would scrape ticketing information off a travel website.
我当前遇到的问题是,我的VBA代码中定义的搜索参数以及后来输入到要执行的网站中的搜索参数无法正常工作.下面提供了已编写的代码.为了提供一些背景信息,我正在从Excel工作簿(例如Beijing(北京))中读取往返目的地,并以与网站希望输入的格式相同的格式(MM-DD-YYYY)定义旅行日期.但是,在运行时,该站点似乎无法识别参数,而是将我定向到页面上显示"正在维护的站点".奇怪的是,当我手动输入参数时,该站点会识别出并提供票务信息.
I am currently encountering an issue where the search parameters defined in my VBA code and later input into the website to be executed is not working. The code that has been written is provided below. To provide some background, I am reading in the to/from destinations from my Excel workbook (e.g. Beijing(北京)) and defining the travel date in the same format (MM-DD-YYYY) as the website would expect it to be input. However, when running, the site does not seem to recognize the parameters and directs me to a page saying "site under maintenance". The odd thing is, when I manually input the parameters, the site recognizes it and provides ticketing information.
我也许想念一些东西吗?我是否需要更新"DepartureCity","ArrivalCity"和"DepartDate"之外的其他值?
Am I perhaps missing something? Do I have to update other values outside of "DepartureCity", "ArrivalCity", and "DepartDate"?
我还注意到,当我遍历多个城市时,该网站会搜索与先前定义的参数相同的参数(即,如果搜索上海->北京,则会生成我之前搜索的天津->北京). 是否可以通过VBA自动删除搜索历史记录/缓存?
I also noticed that when I loop through multiple cities, the site searches for the same parameters as previously defined (i.e. if searching Shanghai -> Beijing, it yields Tianjin -> Beijing which I had previously searched for). Is there a way to auto remove the search history/cache via VBA?
' save from and to destinations under a defined string
sFrom = Range("C3").Value
sTo = Range("C4").Value
' "i" to track the # of days out as defined by the user
For i = 0 To cntDays
dtRange = Date + i
' establish date to pull train ticketing information on
If Len(Day(dtRange)) = 1 Then
sDay = "0" & Day(dtRange)
Else:
sDay = Day(dtRange)
End If
If Len(Month(dtRange)) = 1 Then
sMonth = "0" & Month(dtRange)
Else:
sMonth = Month(dtRange)
End If
sDate = sMonth & "-" & sDay & "-" & Year(dtRange)
' instantiate the oIE object
Set oIE = CreateObject("InternetExplorer.Application")
' open Ctrip travel portal
sURL = "http://english.ctrip.com/trains/#ctm_ref=nb_tn_top"
With oIE
.navigate sURL
.Visible = True
Do Until (.readyState = 4 And Not .Busy)
DoEvents
Loop
' search for particular entry
.document.getElementsByName("DepartureCity")(0).Value = sFrom
.document.getElementsByName("ArrivalCity")(0).Value = sTo
.document.getElementsByName("DepartDate")(0).Value = sDate
MsgBox sFrom
MsgBox sTo
MsgBox sDate
Set ElementCol = .document.getElementsByTagName("button")
For Each btnInput In ElementCol
If btnInput.innerText = "Search" Then
btnInput.Click
Exit For
End If
Next btnInput
' ensure page has been fully loaded
Do Until (.readyState = 4 And Not .Busy)
DoEvents
Loop
推荐答案
再仔细一点,站点使用GET请求执行搜索.
因此,无需加载页面,填充字段并单击按钮.
您可以直接在URL中设置值,并绕过初始页面.
Looking at this a little closer, the site uses a GET request to perform the search.
As such, there is no need to load the page, populate the fields, and click the button.
You can set the values in the URL directly and bypass the initial page.
例如,要搜索2015年12月9日从上海到北京的火车,请加载以下网址...
For instance, to search for trains going from Shanghai to Beijing on 12-9-2015, load the following URL...
当发生故障时,看起来就是这样...
When broken down looks like this...
根据我自己的测试,我确定上述每个字段都是必填字段,否则您将获得维护"屏幕...
From my own testing, I've determined that each of the above fields are required or you get the "maintenance" screen...
这意味着您还需要知道站号.
Which means you need to know the station codes as well.
此外,您必须在名称中提供特殊字符...
In addition you must supply the special characters in the names...
上海%28%E4%B8%8A%E6%B5%B7%29
这篇关于Web爬网-VBA搜索参数无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!