本文介绍了我如何在R中发布一个简单的HTML表单?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我对R编程相对来说比较陌生,我试图将我在约翰斯霍普金斯数据科学中学到的一些东西用于实际应用。具体而言,我想自动从美国财政部网站下载历史债券价格 使用Firefox和R,我可以确定美国财政部网站使用非常简单的HTML POST表单来指定单引号出于兴趣。然后它返回所有未偿还债券的二级市场信息表。 我没有成功尝试使用两个不同的R包向美国财政部网站服务器提交请求。兔子是我试过的两种方法: 尝试#1(使用RCurl): url< - https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm td.html< - postForm(url, submit = 显示价格, priceDate.year = 2014, priceDate.month = 12, priceDate.day = 15, .opts = curlOptions(ssl.verifypeer = FALSE) ) 这会导致网页被返回并存储在 td.html 但它包含的是来自treasurydirect服务器的错误消息。我知道服务器正在工作,因为当我通过浏览器提交相同的请求时,我得到了预期的结果。 尝试#2(使用rvest): s< - html_session(url) f0< - html_form(s) f1< - set_values( test< - submit_form(s,f1) f0 [[2]],priceDate.year = 2014,priceDate.month = 12,priceDate.day = 15) 不幸的是,这种方法甚至不会将R和R的结果写入以下错误消息: 使用'submit'提交函数错误(type,msg,asError = TRUE):< url>格式不正确的 我似乎无法弄清楚如何查看发送给格式错误的文本 任何建议或提示,以解决这个看似简单的任务将不胜感激! httr 库一起工作。 library(httr) url fd< - list( submit =显示价格, priceDate.year = 2014, priceDate.month = 12, priceDate.day = 15 ) resp< -POST(url,body = fd,encode =form) content (resp) rvest 库确实只是一个包装到 httr 。看起来,如果没有服务器名称来解释绝对URL,它看起来不太好。所以如果你看看 f1 $ url #[1] /GA-FI/FedInvest/selectSecurityPriceDate.htm 你会发现它只有路径而不是服务器名称。这似乎令人困惑 httr 。如果您做了 f1 f1 $ url< - url test< - submit_form(s,f1) 这似乎工作。也许这应该被报告给 rvest 。 (测试 rvest_0.1.0 ) I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury websiteUsing both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:Attempt #1 (using RCurl):url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"td.html <- postForm(url, submit = "Show Prices", priceDate.year = 2014, priceDate.month = 12, priceDate.day = 15, .opts = curlOptions(ssl.verifypeer = FALSE))This results in a web page being returned and stored in td.html but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.Attempt #2 (using rvest):s <- html_session(url)f0 <- html_form(s)f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)test <- submit_form(s, f1)Unfortunately, this approach doesn't even leave R and results in the following error message from R:Submitting with 'submit'Error in function (type, msg, asError = TRUE) : <url> malformedI can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.Any suggestions or tips to solving this seeming simple task would be greatly appreciated! 解决方案 Well, it appears to work with the httr library. library(httr)url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"fd <- list( submit = "Show Prices", priceDate.year = 2014, priceDate.month = 12, priceDate.day = 15)resp<-POST(url, body=fd, encode="form")content(resp)The rvest library is really just a wrapper to httr. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look atf1$url# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htmyou see that it just has the path and not the server name. This appears to be confusing httr. If you dof1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)f1$url <- urltest <- submit_form(s, f1)that seems to work. Perhaps it's a big that should be reported to rvest. (Tested on rvest_0.1.0) 这篇关于我如何在R中发布一个简单的HTML表单?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-22 21:24