本文介绍了R WebScraping 使用 Rvest 时获取额外文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 R 和 RVest 网络抓取从 eBay 获取已售日期
网址是
解决方案
这里有 2 个很好的答案,其中包含有关此问题的更多详细信息:Rvest Split Data by类名发生变化的类名
I'm trying to get sold dates from eBay using R and RVest web scraping
The url is url
literally
The full xpath to the first item sold date is: //*[@id="srp-river-results"]/ul/li[1]/div/div[2]/div[2]/div/span/span[1]
If I use that and then html_text() to this path, I get nothing. character(0)
When I remove the spans, and add the POSITIVE node, I get the date, but also a bunch of extra text.
R code:
readHTML <- url %>%
read_html()
SoldDate <- readHTML %>%
html_nodes(xpath='//*[@id="srp-river-results"]/ul/li[1]/div/div[2]/div[2]/div') %>%
html_nodes("[class='POSITIVE']") %>%
html_text(trim = TRUE)
Result:
"SoYlPd N Feb 316,Z RM9USI2021"
I should get:
"Feb 16, 2021"
解决方案
There are 2 great answers with more detail specifics on the issue here:Rvest Split Data by Class Name where the class names change
这篇关于R WebScraping 使用 Rvest 时获取额外文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!