本文介绍了如何使用循环从R中抓取多个网页的网站数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想应用循环从R中的多个网页中抓取数据.我能够为一个网页抓取数据,但是当我尝试对多个页面使用循环时,会出现令人沮丧的错误.我花了几个小时修补,无济于事.任何帮助将不胜感激!!!
I would like to apply a loop to scrape data from multiple webpages in R. I am able to scrape the data for one webpage, however when I attempt to use a loop for multiple pages, I get a frustrating error. I have spent hours tinkering, to no avail. Any help would be greatly appreciated!!!
这有效:
###########################
# GET COUNTRY DATA
###########################
library("rvest")
site <- paste("http://www.countryreports.org/country/","Norway",".htm", sep="")
site <- html(site)
stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)
stats$country <- "Norway"
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
View(stats)
但是,当我尝试在一个循环中编写此代码时,会收到错误消息
However, when I attempt to write this in a loop, I receive an error
###########################
# ATTEMPT IN A LOOP
###########################
country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
for(i in country){
site <- paste("http://www.countryreports.org/country/",country,".htm", sep="")
site <- html(site)
stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)
stats$country <- country
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
stats<-rbind(stats,stats)
stats<-stats[!duplicated(stats),]
}
错误:
Error: length(url) == 1 is not TRUE
In addition: Warning message:
In if (grepl("^http", x)) { :
the condition has length > 1 and only the first element will be used
推荐答案
最终工作代码:
###########################
# THIS WORKS!!!!
###########################
country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")
for(i in country){
site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
site <- html(site)
stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)
stats$nm <- i
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
#stats<-stats[!duplicated(stats),]
all<-rbind(all,stats)
}
View(all)
这篇关于如何使用循环从R中抓取多个网页的网站数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!