无法在HTMLDocument中设置对象变量以从网站上抓取数据

无法在HTMLDocument中设置对象变量以从网站上抓取数据

本文介绍了无法在HTMLDocument中设置对象变量以从网站上抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前看到过这个问题,但是我的情况有些不同,所以我希望能从社区获得一些帮助,或许有一个全新的视角。我有一个用vba编写的宏,它应该从该公司的在线数据库中提取返回数据,例如MSCI World Index,S& P 500等的回报。我在其他页面上使用的代码,但我认为这是不同的。我跟网站管理员谈过,他告诉我这些代码并不是为了被刮掉而设计的,但是这不受其使用政策的限制。如果我真的可以通过获取数据来获取数据,那么对我来说这将是一个非常节省时间的过程,所以我正在努力找出办法。我也在java-script下标记了这一点,因为我认为代码非常类似,我想接受尽可能多的解决方案来解决这个问题。



情况是这样的:当涉及到实际的数据挖掘时,我有下面的代码抛出了Object variable not set错误开始'set els = htmlDoc ....我已经尝试过许多getElement(s)函数的组合,认为这可能是问题,但我画了一个空白。任何人都知道任何其他方法来设置对象变量在这个环境中?或者只是任何其他创造性的方式来拉取数据。



我不能给出登录信息,但我想只需导航到'caRetPage 'site,你可以看到我试图抓取/解析的HTML代码。

  Sub caScrape()

Dim ie As Object'ie:Internet Explorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim els As Object'to store html objects
Dim rtn As String'to value values to store value to被从页面
中删除Dim loginButton As Object
caLoginPage =https://members.cambridgeassociates.com/Logi n / Forms / login-form.asp
caRetPage =https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp
caUser =xxxxx
caPass = xxxxx
Set ie = CreateObject(internetexplorer.application)
ie.Visible = True
ie.navigate caLoginPage
虽然ie.Busy
DoEvents
Wend
Do直到ie.readyState = 4
DoEvents
Loop
设置htmlDoc = ie.document
'登录网站
设置loginButton = htmlDoc.getElementsByTagName(button)。Item(0)
with htmlDoc
.all(Username)。Value = caUser
.all(Password)。Value = caPass
loginButton.Click
End With
While ie.Busy
DoEvents
Wend
Set acceptButton = htmlDoc.getElementsByName(Submit)。Item(0)
acceptButton.Click
虽然ie.Busy
DoEvents
Wend


'这里是返回数据的页面
ie.navigate caRetPage
虽然ie.Busy
DoEvents
Wend
Do直到ie.readyState = 4
DoEvents
Loop
Set htmlDoc = ie.document

'下一行是错误被抛出的位置
Set els = htmlDoc.getElementById(tblData)(0).getElementByTagName(tr)(5)。 getElementByTagName(td)(1)
'还尝试了以下以及getElement命令的大量变体
'Set els = htmlDoc.getElementsByTagName(body)(0).getElementsByTagName(table )(2).getElementsByTagName(tbody)(0).getElementByTagName(tr)(5).getElementByTagName(td)(1)

rtn = els.innerText
Debug.Print(rtn)


End Sub

任何帮助将不胜感激。

解决方案

我相信我有一个工作示例,其中包含上面提供的代码中提供的网址。从我所知道的网站上有框架,所以你需要稍微不同的处理。

另外,等待页面加载需要一个不同的方法。为此我重用了一些代码,我得到了另一个StackOverflow答案。基本上它会一直等到它加载时在页面上找不到任何标签。



以下是代码:

  #If VBA7 Then 
Public Declare PtrSafe Sub Sleep Libkernel32(ByVal dwMilliseconds As LongPtr)
#Else
Public Declare Sub Sleep Lib kernel32(ByVal dwMilliseconds As Long)
#End If

Sub getData()
Dim element As Object
Dim MyURL As String
MyURL = https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp

'后期绑定打开IE
Dim MyBrowser As Object:Set MyBrowser = CreateObject(InternetExplorer.Application )
MyBrowser.Visible = True
MyBrowser.navigate MyURL

waitforload MyBrowser

Set element = MyBrowser.document.getelementsByTagName(Frameset) (0).Children(1).contentdocument
Set element = element.getelementByID(tblData)
Set element = ele (5)
Set element = element.getelementsByTagName(td)(1)
Debug.Print element.innertext
End Sub

Private Sub waitforload(ByRef ie As Object)
Dim i As Byte
Dim tagnames As Long

虽然ie.Busy
Sleep 250
DoEvents
Wend

虽然ie.ReadyState<> 4
Sleep 250
DoEvents
Wend

Do
tagnames = ie.document.getelementsByTagName(*)。Length
For i = 1至5
Sleep 75
如果tagnames = ie.document.getelementsByTagName(*)。Length Then Exit Sub
Next
Loop
End Sub

这应该会返回: 0.10


I have seen this question posted before, but my situation is somewhat different so I was hoping I could get some help from the community and maybe a fresh perspective. I have a macro written in vba that's supposed to pull return data from this company's online db, things like returns for MSCI World Index, S&P 500, etc. The code I have works on other pages, but I think this one is different. I talked to the webmaster and he told me that the code was not designed to be scraped, but that is not restricted by their usage policy. It would be a huge time saver for me if I could, indeed, get the data by scraping so I'm trying really hard to figure out a way to do it. I've tagged this under java-script as well, as I think the code would be very similar and I want to accept as many solutions as possible to solve this problem.

The situation is this: I have the following code that throws me an "Object variable not set" error when it comes to the actual scraping of the data (the line that begins 'set els = htmlDoc...." I've tried many combinations of the getElement(s) function thinking that may have been the problem, but I've drawn a blank. Anybody know any other ways to set the object variable in this environment? Or just any other creative ways to pull the data.

I can't give out the login info, but I think by just navigating to the 'caRetPage' site, you can see the html code that I'm trying to scrape/parse.

Sub caScrape()

Dim ie As Object        'ie: internet explorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim els As Object   'to store html objects
Dim rtn As String   'to store values to be scraped from page
Dim loginButton As Object
caLoginPage = "https://members.cambridgeassociates.com/Login/Forms/login-form.asp"
caRetPage = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"
caUser = "xxxxx"
caPass = "xxxxx"
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
ie.navigate caLoginPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document
'Log in to site
Set loginButton = htmlDoc.getElementsByTagName("button").Item(0)
With htmlDoc
    .all("Username").Value = caUser
    .all("Password").Value = caPass
    loginButton.Click
End With
While ie.Busy
    DoEvents
Wend
Set acceptButton = htmlDoc.getElementsByName("Submit").Item(0)
acceptButton.Click
While ie.Busy
    DoEvents
Wend


'Here is the page with the return data on it
ie.navigate caRetPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document

'This next line is where the error gets thrown
Set els = htmlDoc.getElementById("tblData")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)
    'Also tried the following and plenty of variations of getElement command
'Set els = htmlDoc.getElementsByTagName("body")(0).getElementsByTagName("table")(2).getElementsByTagName("tbody")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)

rtn = els.innerText
Debug.Print(rtn)


End Sub

Any help would be greatly appreciated.

解决方案

I believe I have a working example with the URL provided in the code posted above. From what I can tell there are frames on the website, so you need to handle these slightly differently.

In addition, waiting for the page to load needed a different method. For that I reused some code I got some another StackOverflow answer. Basically it waits until it doesn't find anymore tags on a page as it is loading.

Here's the code:

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If

Sub getData()
    Dim element     As Object
    Dim MyURL       As String
    MyURL = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"

    'Late binding open IE
    Dim MyBrowser   As Object: Set MyBrowser = CreateObject("InternetExplorer.Application")
    MyBrowser.Visible = True
    MyBrowser.navigate MyURL

    waitforload MyBrowser

    Set element = MyBrowser.document.getelementsByTagName("Frameset")(0).Children(1).contentdocument
    Set element = element.getelementByID("tblData")
    Set element = element.getelementsByTagName("tr")(5)
    Set element = element.getelementsByTagName("td")(1)
    Debug.Print element.innertext
End Sub

Private Sub waitforload(ByRef ie As Object)
    Dim i        As Byte
    Dim tagnames As Long

    While ie.Busy
        Sleep 250
        DoEvents
    Wend

    While ie.ReadyState <> 4
        Sleep 250
        DoEvents
    Wend

    Do
        tagnames = ie.document.getelementsByTagName("*").Length
        For i = 1 To 5
            Sleep 75
            If tagnames = ie.document.getelementsByTagName("*").Length Then Exit Sub
        Next
    Loop
End Sub

This should return: 0.10

这篇关于无法在HTMLDocument中设置对象变量以从网站上抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 20:47