我需要获取下一页中列出的所有关注者的网络链接。

https://www.researchgate.net/topic/biotechnology

目前,该主题有206770个关注者。当我单击“查看全部”按钮时,将出现一个弹出列表,其中列出了列表,并且当我按下时它会继续扩展。

https://www.researchgate.net/profile/Kestutis_Sasnauskas
...

以上是热门关注者的链接。有没有办法让所有206770位关注者获得Web链接?

最佳答案

这可以通过使用rvestRSelenium来完成。后者是最需要的,前者会使您的生活更轻松。从github RSelenium安装devtools::install_github("ropensci/RSelenium")rvest来自战队。

这是完成所需内容所需的代码。

siteUrl <- "http://www.researchgate.net/"
GateUrl <- "http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.html?view=dialog&showFollowButton=1&followEvent=tp_followers_xflw&keywordId=4f15497280e582373c000000&offset="

library(rvest)
library(RSelenium)

checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open(silent = FALSE)

i <- 0
profileUrls <- c()

for(j in 1:3){
  print(j)
  remDrv$navigate(paste0(GateUrl, i))
  l <- html(remDrv$getPageSource()[[1]])
  profileUrls <- c(profileUrls,
               paste0(siteUrl, l %>% html_nodes(".display-name") %>% xml_attr("href")))
  i <- length(profileUrls)+1

}

remDrv$close()
profileUrls


这里有几件事。您需要弄清楚j循环。我认为它在每个URL中都可以获取38个配置文件,因此j应该类似于for(j in 1:(followers/38))

第二点是,代码保存链接的方式不是很有效,即每次都将其追加。更好的解决方案是使用lapplyunlist

最后一点,您需要在计算机上安装mozilla firefox,因为这是RSelenium的默认设置,尽管您可以将其设置为使用您喜欢的任何最受欢迎的浏览器。

结果
从前56

> profileUrls
[1] "http://www.researchgate.net/profile/Jose_Carbajo2"
[2] "http://www.researchgate.net/profile/Daniele_Riccio"
[3] "http://www.researchgate.net/profile/Fiona_Togneri2"
[4] "http://www.researchgate.net/profile/Sukanya_Patel"
[5] "http://www.researchgate.net/profile/Neri_Fattorini"
[6] "http://www.researchgate.net/profile/Pham_Thi_Thuy_Van"
[7] "http://www.researchgate.net/profile/Kestutis_Sasnauskas"
[8] "http://www.researchgate.net/profile/Iris_Weintal"
[9] "http://www.researchgate.net/profile/Godelieve_Verhaegen"
[10] "http://www.researchgate.net/profile/Janani_Venkatraman2"
[11] "http://www.researchgate.net/profile/Kai_Wang126"
[12] "http://www.researchgate.net/profile/Irine_Ronin"
[13] "http://www.researchgate.net/profile/Natasha_Ikhsan"
[14] "http://www.researchgate.net/profile/Nadya_Hajar"
[15] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[16] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[17] "http://www.researchgate.net/profile/Wei_Leiyan"
[18] "http://www.researchgate.net/profile/Yosuke_Inada"
[19] "http://www.researchgate.net/profile/Nadya_Hajar"
[20] "http://www.researchgate.net/profile/Gayatr_Venkataraman2"
[21] "http://www.researchgate.net/profile/Amsha_Viraragavan"
[22] "http://www.researchgate.net/profile/Wei_Leiyan"
[23] "http://www.researchgate.net/profile/Yosuke_Inada"
[24] "http://www.researchgate.net/profile/Yongning_You"
[25] "http://www.researchgate.net/profile/Susan_Hu6"
[26] "http://www.researchgate.net/profile/Matt_Evans11"
[27] "http://www.researchgate.net/profile/Nam_Kieu"
[28] "http://www.researchgate.net/profile/Nur_Musa3"
[29] "http://www.researchgate.net/profile/Varaporn_S"
[30] "http://www.researchgate.net/profile/Askar_Begzat3"
[31] "http://www.researchgate.net/profile/Bing_Wang63"
[32] "http://www.researchgate.net/profile/Xuebin_Yan"
[33] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[34] "http://www.researchgate.net/profile/Stephen_Heimann"
[35] "http://www.researchgate.net/profile/Hanina_Hanifa"
[36] "http://www.researchgate.net/profile/Bo_Wang143"
[37] "http://www.researchgate.net/profile/Xuebin_Yan"
[38] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"
[39] "http://www.researchgate.net/profile/Stephen_Heimann"
[40] "http://www.researchgate.net/profile/Hanina_Hanifa"
[41] "http://www.researchgate.net/profile/Bo_Wang143"
[42] "http://www.researchgate.net/profile/Huili_Li5"
[43] "http://www.researchgate.net/profile/Giuseppe_Infusini"
[44] "http://www.researchgate.net/profile/Carmen_Wacher"
[45] "http://www.researchgate.net/profile/Linyn_Linyn"
[46] "http://www.researchgate.net/profile/Dan_Youel"
[47] "http://www.researchgate.net/profile/Catherine_Williams16"
[48] "http://www.researchgate.net/profile/Nichole_Macaraeg"
[49] "http://www.researchgate.net/profile/Peter_Oroszlan"
[50] "http://www.researchgate.net/profile/Eduard_Karamov"
[51] "http://www.researchgate.net/profile/Mauricio_Franco3"
[52] "http://www.researchgate.net/profile/Patricia_Zancan"
[53] "http://www.researchgate.net/profile/Rohana_Dassanayake"
[54] "http://www.researchgate.net/profile/Khadija_Khataby"
[55] "http://www.researchgate.net/profile/Imane_Moest"
[56] "http://www.researchgate.net/profile/Rory_Adey"

10-04 12:54