问题描述
我正在尝试编写一个通过 REST API 访问一些数据的 R 包.但是,该 API 不使用 http 身份验证,而是依靠 cookie 来保存会话凭据.
I am trying to write an R package that accesses some data via a REST API. The API, however, doesn't use http authentication, but rather relies on cookies to keep credentials with the session.
基本上,我想用两个 R 函数替换 bash 脚本中的以下两行:一个用于执行登录,并存储会话 cookie,第二个用于获取数据.
Essentially, I'd like to replace the following two lines from a bash script with two R functions: One to perform the login, and store the session cookie, and the second to GET the data.
curl -X POST -c cookies.txt -d"username=xxx&password=yyy" http://api.my.url/login
curl -b cookies.txt http://api.my.url/data
我显然不明白 RCurl 如何处理 curl 选项.我的脚本目前有:
I'm clearly not understanding how RCurl works with curl options. My script as it stands has:
library(RCurl)
curl <- getCurlHandle()
curlSetOpt(cookiejar='cookies.txt', curl=curl)
postForm("http://api.my.url/login", username='xxx', password='yyy', curl=curl)
getURL('http://api.my.url/data", curl=curl)
最终的 getURL() 失败并显示未登录".来自服务器的消息,并且在 postForm()
之后不存在 cookies.txt
文件.
The final getURL()
fails with a "Not logged in." message from the server, and after the postForm()
no cookies.txt
file exists.
推荐答案
一般情况下你不需要创建 cookie 文件,除非你想研究 cookie.
In general you don't need to create a cookie file, unless you want to study the cookies.
鉴于此,实际上,Web 服务器使用代理数据、重定向和隐藏的帖子数据,但这应该会有所帮助:
Given this, in real word, web servers use agent data, redirecting and hidden post data, but this should help:
library(RCurl)
#Set your browsing links
loginurl = "http://api.my.url/login"
dataurl = "http://api.my.url/data"
#Set user account data and agent
pars=list(
username="xxx"
password="yyy"
)
agent="Mozilla/5.0" #or whatever
#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt", useragent = agent, followlocation = TRUE, curl=curl)
#Also if you do not need to read the cookies.
#curlSetOpt( cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)
#Post login form
html=postForm(loginurl, .params = pars, curl=curl)
#Go wherever you want
html=getURL(dataurl, curl=curl)
#Start parsing your page
matchref=gregexpr("... my regexp ...", html)
#... .... ...
#Clean up. This will also print the cookie file
rm(curl)
gc()
重要
除了用户名和密码之外,通常还有隐藏的帖子数据.要捕获它,您可能想要,例如在 Chrome 中,使用 Developer tools
( ) -> Network Tab
,以显示帖子字段名称和值.
Important
There can often be hidden post data, beyond username and password. To capture it you may want, e.g. in Chrome, to use Developer tools
( ) -> Network Tab
, in order to show the post field names and values.
这篇关于如何通过 RCurl 使用 cookie?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!