本文介绍了如何将 html_nodes 中的 css 和 xpath 参数包装在通用函数中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我想围绕 html_node 创建一个能够读取 CSS 和 XPATH 参数的包装器.我想创建一个带引号的表达式,可以提供给 html_node 并在现场进行评估.我想出了如何分别为 css 和 xpath 创建路径参数,但是当我将此表达式提供给 html_node 时,它​​不起作用.为什么不呢?

I want to create a wrapper around html_node that is capable of reading CSS and XPATH arguments. I want to create a quoted expression that can be supplied to html_node and be evaluated at the spot. I figured out how to create the path argument for css and xpath respectively, but when I supply this expression to html_node it does not work. Why not?

page_parser <- function(dat_list, path = NULL, css = FALSE, attr = "") {
  # make css or path argument for html_nodes
  if (css == TRUE) {
    path <- expr(`=`(css, !!path))
    path <- expr(`=`(xpath, !!path))
  # extract attribute value
  map(dat_list, possibly(function(x) { html_nodes(x, !!path) %>% html_attr(attr) %>% extract(1)}, NA)) %>%
     map(1) %>%
     lapply(function(x) ifelse(is_null(x), "", x)) %>%

read_html("https://www.freitag.de/autoren/lutz-herden/alexis-tsipras-fall-oder-praezedenzfall" %>% parge_parser(path = "//meta[@property='og:title']")

read_html("https://www.freitag.de/autoren/lutz-herden/alexis-tsipras-fall-oder-praezedenzfall" %>% parge_parser(path = ".title", css = TRUE)


The function should spit out the content of behind the css or xpath, no matter whether I specified a CSS or a Xpath.



一般来说,!! 操作符只适用于支持 准报价.不幸的是,rvest::html_nodes 目前没有.(但由于它是 tidyverse 的一部分,如果以后添加支持,我不会感到惊讶.)

In general, !! operator only works in functions that support quasiquoation. Unfortunately, rvest::html_nodes currently does not. (But since it's part of tidyverse, I wouldn't be surprised if the support is added at a later date.)

有几种方法可以以编程方式为函数调用提供参数,包括来自基础 R 的 do.call().但是,鉴于您使用 map遍历您的页面,我建议通过 :

There are several ways to programmatically provide arguments to a function call, including do.call() from base R. However, given that you're using map to traverse your page, I recommend pre-setting css or xpath argument of html_nodes through purrr::partial():

page_parser <- function(dat_list, path = NULL, css = FALSE, attr = "") {
  # make css or xpath argument for html_nodes
  if (css == TRUE) {
    f_html_nodes <- purrr::partial( html_nodes, css = path )
    f_html_nodes <- purrr::partial( html_nodes, xpath = path )

  # extract attribute value
  map(dat_list, possibly(function(x) { f_html_nodes(x) %>% html_attr(attr) %>%
                                         extract(1)}, NA)) %>%
                  map(1) %>% lapply(function(x) ifelse(is_null(x), "", x)) %>%

这篇关于如何将 html_nodes 中的 css 和 xpath 参数包装在通用函数中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 06:36