问题描述
我有一个关键字列表
我需要使用不同的关键字访问相同的网页来获取它提供的特定内容。 x ,我使用 httr
包中的 GET
命令访问网页,然后检索我需要的信息 y
。
library(httr)
library(stringr)
library(XML)
for(i in 1:20){
h1 = GET(paste0(http:.... categories = & query =,x [i]),timeout(10))
par = htmlParse(file = h1)
y [i] = xpathSApply(doc = par,path = // h3 / a,fun = xmlValue)
}
问题是经常会达到超时,并且会中断循环。
所以我想刷新网页或者在超时时间内重试GET命令,因为我怀疑问题出在我尝试访问的网站的互联网连接上。
我的代码工作方式是超时打破循环。我需要忽略错误并进入下一次迭代或重试访问网站。 C> purrr ::安全()。你可以这样包装 GET
:
safe_GET< - purrr: :safe(GET)
这消除了 tryCatch() code>让你做:
resp< - safe_GET(http://example.com) #你可以使用所有合法的`GET`参数
你可以测试 resp $ result
用于 NULL
。把它放到你的重试循环中,你就可以开始行动了。
你可以通过下面的行为来看到这一点:
str(safe_GET(https://httpbin.org/delay/3,timeout(1)))
这将要求httpbin服务在响应之前等待3秒,但是将 GET
请求的显式超时设置为1秒。我将它封装在 str()
中以显示结果:
列表of 2
pre>
$ result:NULL
$ error:2
列表$ message:chr超时已到达
.. $ call:语言卷曲:: curl_fetch_memory(curl_fetch_memory url,handle = handle)
..- attr(*,class)= chr [1:3]simpleErrorerrorcondition
所以,如果需要的话,你甚至可以检查邮件。
I need to access the same web page with different "keys" to get specific content it provides.
I have a list of keys
x
and I use theGET
command fromhttr
package to access the web page and then retrieve the information I needy
.library(httr) library(stringr) library(XML) for (i in 1:20){ h1 = GET ( paste0("http:....categories=&query=", x[i]),timeout(10)) par = htmlParse(file = h1) y[i]=xpathSApply(doc = par, path = "//h3/a" , fun=xmlValue) }
The problem is that timeout is often reached, and it disrupts the loop.
So I would like to refresh the web page or retry the GET command if timeout is reached, because I suspect the problem is with the internet connection of the website I am trying to access.
The way my code works, timeout breaks the loop. I need to either ignore the error and go to next iteration or retry to access the website.
解决方案Look at
purrr::safely()
. You can wrapGET
as such:safe_GET <- purrr::safely(GET)
This removes the ugliness of
tryCatch()
by letting you do:resp <- safe_GET("http://example.com") # you can use all legal `GET` params
And you can test
resp$result
forNULL
. Put that into your retry loop and you're good to go.You can see this in action by doing:
str(safe_GET("https://httpbin.org/delay/3", timeout(1)))
which will ask the httpbin service to wait 3s before responding but set an explicit timeout on the
GET
request to 1s. I wrapped it instr()
to show the result:List of 2 $ result: NULL $ error :List of 2 ..$ message: chr "Timeout was reached" ..$ call : language curl::curl_fetch_memory(url, handle = handle) ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
So, you can even check the message if you need to.
这篇关于如何使用httr GET命令刷新或重试特定网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!