我需要获取下一页中列出的所有关注者的网络链接。
https://www.researchgate.net/topic/biotechnology
目前,该主题有206770个关注者。当我单击“查看全部”按钮时,将出现一个弹出列表,其中列出了列表,并且当我按下时它会继续扩展。
https://www.researchgate.net/profile/Kestutis_Sasnauskas
...
以上是热门关注者的链接。有没有办法让所有206770位关注者获得Web链接?
最佳答案
这可以通过使用rvest
和RSelenium
来完成。后者是最需要的,前者会使您的生活更轻松。从github RSelenium
安装devtools::install_github("ropensci/RSelenium")
。 rvest
来自战队。
这是完成所需内容所需的代码。
siteUrl <- "http://www.researchgate.net/"
GateUrl <- "http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.html?view=dialog&showFollowButton=1&followEvent=tp_followers_xflw&keywordId=4f15497280e582373c000000&offset="
library(rvest)
library(RSelenium)
checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open(silent = FALSE)
i <- 0
profileUrls <- c()
for(j in 1:3){
print(j)
remDrv$navigate(paste0(GateUrl, i))
l <- html(remDrv$getPageSource()[[1]])
profileUrls <- c(profileUrls,
paste0(siteUrl, l %>% html_nodes(".display-name") %>% xml_attr("href")))
i <- length(profileUrls)+1
}
remDrv$close()
profileUrls
这里有几件事。您需要弄清楚
j
循环。我认为它在每个URL中都可以获取38个配置文件,因此j
应该类似于for(j in 1:(followers/38))
。第二点是,代码保存链接的方式不是很有效,即每次都将其追加。更好的解决方案是使用
lapply
和unlist
。最后一点,您需要在计算机上安装mozilla firefox,因为这是
RSelenium
的默认设置,尽管您可以将其设置为使用您喜欢的任何最受欢迎的浏览器。结果
从前56
> profileUrls
[1] "http://www.researchgate.net/profile/Jose_Carbajo2"
[2] "http://www.researchgate.net/profile/Daniele_Riccio"
[3] "http://www.researchgate.net/profile/Fiona_Togneri2"
[4] "http://www.researchgate.net/profile/Sukanya_Patel"
[5] "http://www.researchgate.net/profile/Neri_Fattorini"
[6] "http://www.researchgate.net/profile/Pham_Thi_Thuy_Van"
[7] "http://www.researchgate.net/profile/Kestutis_Sasnauskas"
[8] "http://www.researchgate.net/profile/Iris_Weintal"
[9] "http://www.researchgate.net/profile/Godelieve_Verhaegen"
[10] "http://www.researchgate.net/profile/Janani_Venkatraman2"
[11] "http://www.researchgate.net/profile/Kai_Wang126"
[12] "http://www.researchgate.net/profile/Irine_Ronin"
[13] "http://www.researchgate.net/profile/Natasha_Ikhsan"
[14] "http://www.researchgate.net/profile/Nadya_Hajar"
[15] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[16] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[17] "http://www.researchgate.net/profile/Wei_Leiyan"
[18] "http://www.researchgate.net/profile/Yosuke_Inada"
[19] "http://www.researchgate.net/profile/Nadya_Hajar"
[20] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[21] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[22] "http://www.researchgate.net/profile/Wei_Leiyan"
[23] "http://www.researchgate.net/profile/Yosuke_Inada"
[24] "http://www.researchgate.net/profile/Yongning_You"
[25] "http://www.researchgate.net/profile/Susan_Hu6"
[26] "http://www.researchgate.net/profile/Matt_Evans11"
[27] "http://www.researchgate.net/profile/Nam_Kieu"
[28] "http://www.researchgate.net/profile/Nur_Musa3"
[29] "http://www.researchgate.net/profile/Varaporn_S"
[30] "http://www.researchgate.net/profile/Askar_Begzat3"
[31] "http://www.researchgate.net/profile/Bing_Wang63"
[32] "http://www.researchgate.net/profile/Xuebin_Yan"
[33] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[34] "http://www.researchgate.net/profile/Stephen_Heimann"
[35] "http://www.researchgate.net/profile/Hanina_Hanifa"
[36] "http://www.researchgate.net/profile/Bo_Wang143"
[37] "http://www.researchgate.net/profile/Xuebin_Yan"
[38] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[39] "http://www.researchgate.net/profile/Stephen_Heimann"
[40] "http://www.researchgate.net/profile/Hanina_Hanifa"
[41] "http://www.researchgate.net/profile/Bo_Wang143"
[42] "http://www.researchgate.net/profile/Huili_Li5"
[43] "http://www.researchgate.net/profile/Giuseppe_Infusini"
[44] "http://www.researchgate.net/profile/Carmen_Wacher"
[45] "http://www.researchgate.net/profile/Linyn_Linyn"
[46] "http://www.researchgate.net/profile/Dan_Youel"
[47] "http://www.researchgate.net/profile/Catherine_Williams16"
[48] "http://www.researchgate.net/profile/Nichole_Macaraeg"
[49] "http://www.researchgate.net/profile/Peter_Oroszlan"
[50] "http://www.researchgate.net/profile/Eduard_Karamov"
[51] "http://www.researchgate.net/profile/Mauricio_Franco3"
[52] "http://www.researchgate.net/profile/Patricia_Zancan"
[53] "http://www.researchgate.net/profile/Rohana_Dassanayake"
[54] "http://www.researchgate.net/profile/Khadija_Khataby"
[55] "http://www.researchgate.net/profile/Imane_Moest"
[56] "http://www.researchgate.net/profile/Rory_Adey"