问题描述
我在抓取这个特定网页的数据时遇到了一个可怕的时间...基本上,当我在浏览器中加载 URL 并手动按 F12 时,我可以在DOM 资源管理器"中看到我需要的信息,但是当我以编程方式尝试时做同样的事情(见下文),HTMLDoc 不包含我在DOM 资源管理器"中看到的相同信息......
公共子 testCode()将 IE 调暗为 SHDocVw.InternetExplorer将 HTMLDoc 变暗为 MSHTML.HTMLDocument设置 IE = 新的 SHDocVw.InternetExplorer用 IE.navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"而 .Busy = True 或 .ReadyState <>READYSTATE_COMPLETE:温德设置 HTMLDoc = .Document结束于结束子
有人可以帮我访问DOM 资源管理器"中的信息吗?我知道 HTML 并不总是您在浏览器中看到的内容,而是创建您可以在浏览器中看到的内容的说明,但是必须有一种方法可以从 HTML 以编程方式创建 DOM...
此外,我相信我所追求的数据是由脚本或 iFrame 生成的,但我一直无法生成我正在寻找的数据,因为它们都被弄乱了......
更新
请参阅下面的 DOM Explorer 图片:
大纲:
将 GET XHR 发送到
主要有6个部分,数据的相关部分被提取并输出到6个工作表(运行前必须手动创建):
Sheet1 - 每日预测Sheet2 - Horly 预测Sheet3 - 响应数据(转置)Sheet4 - 当前数据(转置)Sheet5 - 天文学(转置)Sheet6 - 每小时历史数据
有了这个例子,你就可以从那个 JSON 响应中提取你需要的数据.
I have having a horrible time scraping this particular webpage's data... Basically I can see the information that I need in the 'DOM Explorer' when I load the URL in a browser and hit F12 manually, but when I programmatically attempt to do the same (see below) the HTMLDoc does not contain the same information that I can see in the 'DOM Explorer'...
Public Sub testCode() Dim IE As SHDocVw.InternetExplorer Dim HTMLDoc As MSHTML.HTMLDocument Set IE = New SHDocVw.InternetExplorer With IE .navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW" While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE: Wend Set HTMLDoc = .Document End With End Sub
Can someone please help me access the information in the 'DOM Explorer'? I know that HTML is not always what you see in a browser, but rather the instructions to create what you can see in the browser, but then there must be a way to programmatically create the DOM from the HTML...
Also, I believe that the data I am after is being generated by scripts or iFrames, but I have been unable to generate the data I am looking for from messing around with either....
UPDATE
See picture of DOM Explorer below:
解决方案The outline:
Make GET XHR to https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW.
Extract the location from the HTML response line
var query = 'zmw:' + '00000.271.03969';
and the key from the linesvar citypage_options = {k: 'c991975b7f4186c0', ...
.Make GET XHR using the location
00000.271.03969
and the keyc991975b7f4186c0
to https://api-ak-aws.wunderground.com/api/c991975b7f4186c0/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:00000.271.03969.json.Parse JSON response (with e. g. VBA JSON parser) using
Parse()
, convert the necessary data usingToArray()
, and output as tables to the worksheet(s).
Actually web browser do almost the same stuff each time you open that webpage.
You may use the below VBA code to parse response and output result. Import JSON.bas module into the VBA project for JSON processing.
Sub TestScrapeWunderground() Dim sContent As String Dim sKey As String Dim sLocation As String Dim vJSON As Variant Dim sState As String Dim oDays As Object Dim oHours As Object Dim vDay As Variant Dim vHour As Variant Dim aRows() As Variant Dim aHeader() As Variant ' GET XHR to retrieve location and key With CreateObject("MSXML2.ServerXMLHTTP") .Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False .Send sContent = .responseText End With ' Extract location and key from HTML content sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0) sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0) ' GET XHR to retrieve JSON data With CreateObject("MSXML2.ServerXMLHTTP") .Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False .Send sContent = .responseText End With ' Parse JSON response to data structure JSON.Parse sContent, vJSON, sState ' Populate dictionaries with daily and hourly forecast data Set oDays = CreateObject("Scripting.Dictionary") Set oHours = CreateObject("Scripting.Dictionary") For Each vDay In vJSON("forecast")("days") oDays(vDay("summary")) = "" For Each vHour In vDay("hours") oHours(vHour) = "" Next Next ' Convert daily forecast data to arrays JSON.ToArray oDays.Keys(), aRows, aHeader ' Output daily forecast data to table With Sheets(1) .Cells.Delete OutputArray .Cells(1, 1), aHeader Output2DArray .Cells(2, 1), aRows .Columns.AutoFit End With ' Convert hourly forecast data to arrays JSON.ToArray oHours.Keys(), aRows, aHeader ' Output hourly forecast data to table With Sheets(2) .Cells.Delete OutputArray .Cells(1, 1), aHeader Output2DArray .Cells(2, 1), aRows .Columns.AutoFit End With ' Convert response data to arrays JSON.ToArray Array(vJSON("response")), aRows, aHeader ' Output response transposed data to table With Sheets(3) .Cells.Delete Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader) Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows) .Columns.AutoFit End With ' Convert current data to arrays JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader ' Output current transposed data to table With Sheets(4) .Cells.Delete Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader) Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows) .Columns.AutoFit End With ' Populate dictionary with daily astronomy data Set oDays = CreateObject("Scripting.Dictionary") For Each vDay In vJSON("astronomy")("days") oDays(vDay) = "" Next ' Convert daily astronomy data to arrays JSON.ToArray oDays.Keys(), aRows, aHeader ' Output daily astronomy transposed data to table With Sheets(5) .Cells.Delete Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader) Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows) .Columns.AutoFit End With ' Convert hourly history data to arrays JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader ' Output hourly history data to table With Sheets(6) .Cells.Delete OutputArray .Cells(1, 1), aHeader Output2DArray .Cells(2, 1), aRows .Columns.AutoFit End With MsgBox "Completed" End Sub Sub OutputArray(oDstRng As Range, aCells As Variant) With oDstRng .Parent.Select With .Resize( _ 1, _ UBound(aCells) - LBound(aCells) + 1) .NumberFormat = "@" .Value = aCells End With End With End Sub Sub Output2DArray(oDstRng As Range, aCells As Variant) With oDstRng .Parent.Select With .Resize( _ UBound(aCells, 1) - LBound(aCells, 1) + 1, _ UBound(aCells, 2) - LBound(aCells, 2) + 1) .NumberFormat = "@" .Value = aCells End With End With End Sub
The second XHR returns JSON data, to make it clear how the necessary data is being extracted from it, you may save the JSON to file, copy the contents and paste it to any JSON viewer for further study. I use online tool http://jsonviewer.stack.hu, root element structure is shown below:
There are 6 main sections, the relevant part of the data is extracted and output to 6 worksheets (which have to be created manually before run):
Sheet1 - Daily forecast Sheet2 - Horly forecast Sheet3 - Response data (transposed) Sheet4 - Current data (transposed) Sheet5 - Astronomy (transposed) Sheet6 - Hourly history data
Having that example you can extract the data you need from that JSON response.
这篇关于使用 VBA 进行网页抓取(当 HTML <> DOM 时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!