本文介绍了如何获取谷歌搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用了以下代码:
library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE)
{
search.term <- gsub(' ', '%20', search.term)
if(quotes) search.term <- paste('%22', search.term, '%22', sep='')
getGoogleURL <- paste('http://www.google', domain, '/search?q=',
search.term, sep='')
}
getGoogleLinks <- function(google.url)
{
doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
nodes <- getNodeSet(html, "//a[@href][@class='l']")
return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]))
}
search.term <- "cran"
quotes <- "FALSE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)
links <- getGoogleLinks(search.url)
我想找到搜索产生的所有链接,我得到以下结果:
I would like to find all the links that resulted from my search and I get the following result:
> links
list()
如何获取链接?
另外我想获得谷歌搜索结果的头条和摘要如何获得?
最后是否有办法获取ChillingEffects.org结果中的链接?
How can I get the links?In addition I would like to get the headlines and summary of google results how can I get it?And finally is there a way to get the links that resides in ChillingEffects.org results?
推荐答案
如果你看在 html
变量中,您可以看到搜索结果链接全部嵌套在< h3 class =r>
标签。
If you look at the html
variable, you can see that the search result links all are nested in <h3 class="r">
tags.
尝试将 getGoogleLinks
功能更改为:
getGoogleLinks <- function(google.url) {
doc <- getURL(google.url, httpheader = c("User-Agent" = "R
(2.10.0)"))
html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
(...){})
nodes <- getNodeSet(html, "//h3[@class='r']//a")
return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}
这篇关于如何获取谷歌搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!