如何使用httr GET命令刷新或重试特定网页？

本文介绍了如何使用httr GET命令刷新或重试特定网页？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个关键字列表

我需要使用不同的关键字访问相同的网页来获取它提供的特定内容。 x ，我使用 httr 包中的 GET 命令访问网页，然后检索我需要的信息 y 。

  library（httr）
 library（stringr）
 library（XML）
 
 for（i in 1:20）{
 h1 = GET（paste0（http：.... categories = & query =，x [i]），timeout（10））
 par = htmlParse（file = h1）
 
y [i] = xpathSApply（doc = par，path = // h3 / a，fun = xmlValue）
 
}

问题是经常会达到超时，并且会中断循环。

所以我想刷新网页或者在超时时间内重试GET命令，因为我怀疑问题出在我尝试访问的网站的互联网连接上。

我的代码工作方式是超时打破循环。我需要忽略错误并进入下一次迭代或重试访问网站。 C> purrr ::安全（）。你可以这样包装 GET ：

  safe_GET<  -  purrr： ：safe（GET）

这消除了tryCatch（） code>让你做：

  resp<  -  safe_GET（http://example.com） ＃你可以使用所有合法的`GET`参数

你可以测试 resp $ result 用于 NULL 。把它放到你的重试循环中，你就可以开始行动了。

你可以通过下面的行为来看到这一点： str（safe_GET（https://httpbin.org/delay/3，timeout（1）））这将要求httpbin服务在响应之前等待3秒，但是将 GET 请求的显式超时设置为1秒。我将它封装在 str（）中以显示结果：列表of 2 $ result：NULL $ error：2 列表$ message：chr超时已到达 .. $ call：语言卷曲:: curl_fetch_memory（curl_fetch_memory url，handle = handle） ..- attr（*，class）= chr [1：3]simpleErrorerrorcondition pre> 所以，如果需要的话，你甚至可以检查邮件。 I need to access the same web page with different "keys" to get specific content it provides. I have a list of keys x and I use the GET command from httr package to access the web page and then retrieve the information I need y. library(httr) library(stringr) library(XML) for (i in 1:20){ h1 = GET ( paste0("http:....categories=&query=", x[i]),timeout(10)) par = htmlParse(file = h1) y[i]=xpathSApply(doc = par, path = "//h3/a" , fun=xmlValue) } The problem is that timeout is often reached, and it disrupts the loop. So I would like to refresh the web page or retry the GET command if timeout is reached, because I suspect the problem is with the internet connection of the website I am trying to access. The way my code works, timeout breaks the loop. I need to either ignore the error and go to next iteration or retry to access the website. 解决方案 Look at purrr::safely(). You can wrap GET as such: safe_GET <- purrr::safely(GET) This removes the ugliness of tryCatch() by letting you do: resp <- safe_GET("http://example.com") # you can use all legal `GET` params And you can test resp$result for NULL. Put that into your retry loop and you're good to go. You can see this in action by doing: str(safe_GET("https://httpbin.org/delay/3", timeout(1))) which will ask the httpbin service to wait 3s before responding but set an explicit timeout on the GET request to 1s. I wrapped it in str() to show the result: List of 2 $ result: NULL $ error :List of 2 ..$ message: chr "Timeout was reached" ..$ call : language curl::curl_fetch_memory(url, handle = handle) ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" So, you can even check the message if you need to. 这篇关于如何使用httr GET命令刷新或重试特定网页？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！