本文介绍了R从aspx下载https获取网站而不是CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

警告:纽贝在这里。我会感谢一些指导。我正在努力学习如何使用R自动化下载。



我需要什么:
要下载数据在所有县和报告期间,从本网站的页岩气井:

(请注意,进入时可能会要求协议,而不是很大)



我可以访问列出我要下载的所有CSV文件的页面。不幸的是,该网站具有与上述相同的地址。 (您可以尝试选择一个县和报告期,并自行查看)



但是,在该页面中,列出了激活CSV下载的链接。每个人都是这样的:



我尝试过的:

 库(下载)

下载(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/生产/生产ByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY,
destfile =Prod_AUG15_Allegheny.csv)

我跟随了另一个人在这里做的:



问题:
此命令保存网站而不是csv文件。 p>

 尝试URL'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY = false& INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY'
内容类型'text / html; charset = utf-8'length 11592 bytes(11 Kb)
打开URL
已下载11 Kb

问题:
是否与我的网页是https而不是http相关?
任何有关如何解决问题的指导或其他相关的帖子?
(我可以在aspx下载找到一些帖子,但没有帮助)



提前感谢

解决方案

@hrbrmstr它工作!不是我想在乞讨的方式,但RSelenium我可以点击按钮接受协议,并实际打开下载链接。



这是代码(很简单,但总是让我一整天找出什么耻辱):

 #使用RSelenium保存文件
##如果需要,安装软件包
install.packages(RSelenium)
##激活
库(RSelenium)
checkForServer()
startServer()
#我必须手动启动服务器!
remDr< - remoteDriver()
remDr
remDr $ open()
#open网站和接受条件
remDr $ navigate(https:// www。 paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx)
AgreeButton< -remDr $ findElement(using ='id',value =MainContent_AgreeButton)
AgreeButton $ highlightElement( )
AgreeButton $ clickElement()

remDr $ navigate(https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false& ; INC_HOME_USE_WELLS = true& INC_NON_PRODUCING_WELLS = true& PERIOD = 15AUGU& COUNTY = ALLEGHENY)

然而!我无法保存csv文件:-(我知道我需要一个命令将链接保存为...但是我在另一个与RSelenium有关的话题中提出这个命令。



当我发现时会编辑答案!


warning: Newbe here. I would appreciate some guidance. I am trying to do the investment to learn how to use R for automatizing downloads.

What I need:To download data on shale gas wells from this website for all counties and reporting periods:https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCounty.aspx(Note that agreement might be asked when entering, not a big deal)

I can get to the page that lists all the CSV files I want to download. Unfortunately the site has the same address as above. (You can try to choose a county and a reporting period and see for yourself)

However once in that page, the links that activate the CSV downloads are listed. For each of them is something like this:https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY

What I have tried:

library(downloader)

download ("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY",
          destfile="Prod_AUG15_Allegheny.csv")

I have followed what another person did here:Download documents from aspx web page in R

The problem:This command saves the website instead of the csv file.

trying URL 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY'
Content type 'text/html; charset=utf-8' length 11592 bytes (11 Kb)
opened URL
downloaded 11 Kb

The question:Is it related with my page being a https instead of http?Any guidance on how to solve it or other posts that are relevant?(I could find some posts on aspx downloads but nothing helpful)

Thanks in advance

解决方案

@hrbrmstr It worked! Not the way I wanted at the beggining but with RSelenium I could click the button for accepting the agreement and actually open the download link.

Here is the code (Is simple but took me all day to find out, what a shame):

# Using RSelenium to save file
##Installing the package if needed
install.packages("RSelenium")
##Activating
library("RSelenium")
checkForServer()
startServer()
#I had to start the server manually!
remDr <- remoteDriver()
remDr
remDr$open()
#open website and accepting conditions
remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx")
AgreeButton<-remDr$findElement(using = 'id', value="MainContent_AgreeButton")
AgreeButton$highlightElement()
AgreeButton$clickElement()

remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY")

However!! I am not able to save the csv file :-(. I know I need a command for "Save link as..." But I am asking this in another topic related to RSelenium.

Will Edit the answer when I find out!

这篇关于R从aspx下载https获取网站而不是CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 14:44