Rvest 如何避免 open.connection(x, “rb") 中的错误:HTTP 错误 404 R

本文介绍了Rvest 如何避免 open.connection(x, “rb") 中的错误:HTTP 错误 404 R的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从网站列表中获取一些信息.我有一个网址列表，但有些网址不起作用/不存在.

I'd like to take some informations from a list of website.I have a list of urls, but there are some that doesn't work/exesist.

错误是:

open.connection(x, "rb") 中的错误:HTTP 错误 404 R

library(Rvest)
url_web<-(c("https://it.wikipedia.org/wiki/Roma",
        "https://it.wikipedia.org/wiki/Milano",
        "https://it.wikipedia.org/wiki/Napoli",
        "https://it.wikipedia.org/wiki/Torinoooo", # for example this is an error
        "https://it.wikipedia.org/wiki/Palermo",
        "https://it.wikipedia.org/wiki/Venezia"))

我为我的目标编写了这段代码.

I write this code for my target.

我尝试使用 try，但没有用.

I tried to use try, but doesn't work.

我尝试在 for 中使用 ifelse(url.exists(url_web)==TRUE,Cont<-read_html(url_web), NA )，但没有不行.

I tried to use an ifelse(url.exists(url_web)==TRUE,Cont<-read_html(url_web), NA ) into the for, but doesn't work.

for (i in 1:length(url_web)){
      Cont<-read_html(i)
      Dist_1<-html_nodes(Cont, ".firstHeading") %>%
      html_text()
      print(Dist_1)
    }

问题是:如何跳转无法链接或写错的网址?

The question is: How I can jump the url where I can't link or where is writes wrong?

先谢谢你.

弗朗西斯科

推荐答案

一个简单的 try 应该可以解决问题

A simple try should do the trick

parsed_pages <- replicate(list(), n = length(url_web))
for (k in seq_along(url_web)) parsed_pages[[k]] <- try(xml2::read_html(url_web[k]), silent = TRUE)

silent = TRUE 参数意味着任何错误都将被忽略.默认情况下，silent = FALSE 使 try 报告错误.请注意，即使 silent = FALSE 代码也能工作(报告的错误可能使它看起来像我们认为的那样).

The silent = TRUE argument means any error will be disregarded. By default, silent = FALSE which makes try report the errors. Note that even if silent = FALSE the code works (the reported errors might make it look as thought it didn't).

这里我们可以测试上面的代码

Here we can test the above code

for (k in seq_along(url_web)) print(class(parsed_pages[[k]]))
# [1] "xml_document" "xml_node"
# [1] "xml_document" "xml_node"
# [1] "xml_document" "xml_node"
# [1] "try-error"
# [1] "xml_document" "xml_node"
# [1] "xml_document" "xml_node"

这篇关于Rvest 如何避免 open.connection(x, “rb") 中的错误:HTTP 错误 404 R的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

rvest