问题描述
我正在尝试使用R和SelectorGadget中的rvest包从Glassdoor中提取单个等级(API仅提供摘要等级).
I am trying to pull individual ratings from Glassdoor (the API only provides summary ratings) using the rvest package in R and SelectorGadget to identify my CSS selectors.
问题是Glassdoor使用图像传达评分,但数字评分包含在图像标题中.使用SelectorGadget,我可以从下面的代码片段(使用"#EmployerReviews undecorated li")中抓取"Comp& Benefits"文本,但在span ... title =部分中找不到"2.0",这就是我想要的.
The problem is Glassdoor uses images to convey the ratings, but the numeric rating is contained in the image title. Using SelectorGadget, I can scrape the "Comp & Benefits" text from the code snippet below (using "#EmployerReviews undecorated li"), but I can't get to the "2.0" in the span...title= section, which is what I want.
<div id='EmployerReviews'> .... <ul class='undecorated'> <li> <div class='minor'>Comp & Benefits</div> <span class='notranslate notranslate_title gdBars gdRatings med ' title="2.0">
过去有人成功抓取图像标题,还是知道获得这些个人评分的另一种方法?
Anyone had success scraping image titles in the past, or know of another way to get these individual ratings?
推荐答案
您将需要选择跨度,并使用html_attr()
提取其属性值:
You will need to select the span, and use html_attr()
to extract its attribute value:
html <- html("...")
rating <- html %>%
html_nodes("#EmployerReviews .undecorated li span.gdRatings") %>%
html_attr("title")
rating
# [1] "2.0"
这篇关于用rvest抓取图像标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!