本文介绍了R 如何检查 XPath 是否存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



hoping someone more knowledgeable than me can throw some light here.

作为更大的网络爬虫的一部分,我想从一组页面中提取元数据.当我运行它时它倒塌了,调查表明这是由于 Xpath 被请求的其中一个不存在.

As part of a larger web-scraper I want to pull meta data out of a set of pages. When I ran this it fell over, investigation shows this was due to one of the Xpath's being requested not existing.


I can see one potential solution is to grab ALL the meta for a page into a vector and to check if each required one exists before building a new vector of just those I want.



It would be even better if I only grabbed the bits I want if they exist in the page.

parsed <- htmlParse("http://www.coindesk.com/information")

meta <- list()
meta[1] <- xpathSApply(parsed, "//meta[starts-with(@property, \"og:title\")]", xmlGetAttr,"content")
meta[2] <- xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content")
meta[3] <- xpathApply(parsed, "//meta[starts-with(@property, \"og:url\")]",  xmlGetAttr,"content")
meta[4] <- xpathApply(parsed, "//meta[starts-with(@property, \"article:published_time\")]",  xmlGetAttr,"content")
meta[5] <- xpathApply(parsed, "//meta[starts-with(@property, \"article:modified_time\")]",  xmlGetAttr,"content")

这将引发错误,因为 og:description 不在此页面中.

This will throw an error as og:description isn't in this page.

Error in meta[2] <- xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]",  :
  replacement has length zero

任何人都可以建议一个简单的测试,在尝试提取它之前检查它的存在,优雅地摔倒,也许是一个 NULL 响应?

Can anyone suggest a simple test that will check for its existence before trying to extract it, falling over gracefully with perhaps a NULL response?



Assuming the error comes when you try and process the empty list...

> parsed <- htmlParse("http://www.coindesk.com/information")
> meta <- xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content")
> meta
> length(meta)==0
[1] TRUE

然后测试 length(meta)==0 - 如果元素丢失,则为 TRUE.否则它的 FALSE - 如在这个提取标题属性的例子中:

Then test for length(meta)==0 - which is TRUE if the element is missing. Otherwise its FALSE - as in this example of extracting the title property:

> meta <- xpathApply(parsed, "//meta[starts-with(@property, \"og:title\")]", xmlGetAttr,"content")
> meta
[1] "Beginner's guide to bitcoin - CoinDesk's Information Center"

> length(meta)==0

这篇关于R 如何检查 XPath 是否存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:18