本文介绍了R网络跨越多个页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我正在制作一个网络抓取计划,以搜索特定的葡萄酒并返回该品种的当地葡萄酒清单。我遇到的问题是多个页面结果。下面的代码是我正在使用的一个基本示例 url2 htmlpage2< - read_html(url2) names2< - html_nodes(htmlpage2,.review-listing .title) Wines2< - html_text(names2) 对于此特定搜索,共有39页结果。我知道URL更改为 http://www.winemag。 com /?s = washington%20merlot& drink_type = wine& page = 2 ,但是有没有简单的方法让代码遍历所有返回的页面并将所有39页的结果编译成单个列表?我知道我可以手动做所有的网址,但这似乎是矫枉过正。 以及如果你想所有的信息作为 data.frame : library(rvest) library(purrr) url_base< - http://www.winemag.com/?s=washington美乐& drink_type =红酒& page =%d map_df(1:39,函数(i){ #简单但有效的进度指示器 cat(。) pg< - read_html(sprintf(url_base,i)) data.frame(wine = html_text(html_nodes(pg,.review-listing .title)),摘录= html_text(html_nodes(pg,div.excerpt)), rating = gsub(Points,,html_text(html_nodes(pg,span.rating))), appellation = html_text(html_nodes(pg,span.appellation)), price = gsub(\\ $,,html_text(html_nodes(pg,span.price))), stringsAsFactors = FALSE) }) - >葡萄酒 dplyr ::瞥见(葡萄酒) ##观察结果:1,170 ##变量:5 ## $酒(chr)Charles Smith 2012 Royal城市西拉(哥伦比亚谷(华盛顿州)... ## $摘录(chr)绿橄榄,绿茎和新鲜草本香气在... ## $评级(chr) 96,95,94,93,93,93,93,93,93,93... ## $ appellation 哥伦比亚谷,哥伦比亚谷,哥伦比亚谷,... ## $ price(chr)140,70,70,20,70 ,40,135,50,60,3 ... I am working on a web scraping program to search for specific wines and return a list of local wines of that variety. The problem I am having is multiple page results. The code below is a basic example of what I am working with url2 <- "http://www.winemag.com/?s=washington+merlot&search_type=reviews"htmlpage2 <- read_html(url2)names2 <- html_nodes(htmlpage2, ".review-listing .title")Wines2 <- html_text(names2)For this specific search there are 39 pages of results. I know the url changes to http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=2, but is there an easy way to make the code loop through all the returned pages and compile the results from all 39 pages into a single list? I know I can manually do all the urls, but that seems like overkill. 解决方案 You can do something similar with purrr::map_df() as well if you want all the info as a data.frame:library(rvest)library(purrr)url_base <- "http://www.winemag.com/?s=washington merlot&drink_type=wine&page=%d"map_df(1:39, function(i) { # simple but effective progress indicator cat(".") pg <- read_html(sprintf(url_base, i)) data.frame(wine=html_text(html_nodes(pg, ".review-listing .title")), excerpt=html_text(html_nodes(pg, "div.excerpt")), rating=gsub(" Points", "", html_text(html_nodes(pg, "span.rating"))), appellation=html_text(html_nodes(pg, "span.appellation")), price=gsub("\\$", "", html_text(html_nodes(pg, "span.price"))), stringsAsFactors=FALSE)}) -> winesdplyr::glimpse(wines)## Observations: 1,170## Variables: 5## $ wine (chr) "Charles Smith 2012 Royal City Syrah (Columbia Valley (WA)...## $ excerpt (chr) "Green olive, green stem and fresh herb aromas are at the ...## $ rating (chr) "96", "95", "94", "93", "93", "93", "93", "93", "93", "93"...## $ appellation (chr) "Columbia Valley", "Columbia Valley", "Columbia Valley", "...## $ price (chr) "140", "70", "70", "20", "70", "40", "135", "50", "60", "3... 这篇关于R网络跨越多个页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
06-20 13:33