问题描述
警告:纽贝在这里。我会感谢一些指导。我正在努力学习如何使用R自动化下载。 我需要什么:
要下载数据在所有县和报告期间,从本网站的页岩气井:
(请注意,进入时可能会要求协议,而不是很大)
我可以访问列出我要下载的所有CSV文件的页面。不幸的是,该网站具有与上述相同的地址。 (您可以尝试选择一个县和报告期,并自行查看)
但是,在该页面中,列出了激活CSV下载的链接。每个人都是这样的:
我尝试过的:
库(下载)
下载(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/生产/生产ByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY,
destfile =Prod_AUG15_Allegheny.csv)
我跟随了另一个人在这里做的:
问题:
此命令保存网站而不是csv文件。 p>
尝试URL'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY'
内容类型'text / html; charset = utf-8'length 11592 bytes(11 Kb)
打开URL
已下载11 Kb
问题:
是否与我的网页是https而不是http相关?
任何有关如何解决问题的指导或其他相关的帖子?
(我可以在aspx下载找到一些帖子,但没有帮助)
提前感谢
@hrbrmstr它工作!不是我想在乞讨的方式,但RSelenium我可以点击按钮接受协议,并实际打开下载链接。
这是代码(很简单,但总是让我一整天找出什么耻辱):
#使用RSelenium保存文件
##如果需要,安装软件包
install.packages(RSelenium)
##激活
库(RSelenium)
checkForServer()
startServer()
#我必须手动启动服务器!
remDr< - remoteDriver()
remDr
remDr $ open()
#open网站和接受条件
remDr $ navigate(https:// www。 paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx)
AgreeButton< -remDr $ findElement(using ='id',value =MainContent_AgreeButton)
AgreeButton $ highlightElement( )
AgreeButton $ clickElement()
remDr $ navigate(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false& ; INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY)
然而!我无法保存csv文件:-(我知道我需要一个命令将链接保存为...但是我在另一个与RSelenium有关的话题中提出这个命令。
当我发现时会编辑答案!
warning: Newbe here. I would appreciate some guidance. I am trying to do the investment to learn how to use R for automatizing downloads.
What I need:To download data on shale gas wells from this website for all counties and reporting periods:https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCounty.aspx(Note that agreement might be asked when entering, not a big deal)
I can get to the page that lists all the CSV files I want to download. Unfortunately the site has the same address as above. (You can try to choose a county and a reporting period and see for yourself)
However once in that page, the links that activate the CSV downloads are listed. For each of them is something like this:https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY
What I have tried:
library(downloader)
download ("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY",
destfile="Prod_AUG15_Allegheny.csv")
I have followed what another person did here:Download documents from aspx web page in R
The problem:This command saves the website instead of the csv file.
trying URL 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY'
Content type 'text/html; charset=utf-8' length 11592 bytes (11 Kb)
opened URL
downloaded 11 Kb
The question:Is it related with my page being a https instead of http?Any guidance on how to solve it or other posts that are relevant?(I could find some posts on aspx downloads but nothing helpful)
Thanks in advance
@hrbrmstr It worked! Not the way I wanted at the beggining but with RSelenium I could click the button for accepting the agreement and actually open the download link.
Here is the code (Is simple but took me all day to find out, what a shame):
# Using RSelenium to save file
##Installing the package if needed
install.packages("RSelenium")
##Activating
library("RSelenium")
checkForServer()
startServer()
#I had to start the server manually!
remDr <- remoteDriver()
remDr
remDr$open()
#open website and accepting conditions
remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx")
AgreeButton<-remDr$findElement(using = 'id', value="MainContent_AgreeButton")
AgreeButton$highlightElement()
AgreeButton$clickElement()
remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY")
However!! I am not able to save the csv file :-(. I know I need a command for "Save link as..." But I am asking this in another topic related to RSelenium.
Will Edit the answer when I find out!
这篇关于R从aspx下载https获取网站而不是CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!