下面的代码在交互模式下可以正常工作,但在功能中使用时失败。这只是两个身份验证POST命令,然后是数据下载。我的目标是使它在功能内运行,而不仅仅是在交互模式下。

这个问题有点像this question的续集.. icpsr最近更新了他们的网站。下面的最小可复制示例需要一个免费帐户,该帐户可从以下位置获得

https://www.icpsr.umich.edu/rpxlogin?path=ICPSR&request_uri=https%3a%2f%2fwww.icpsr.umich.edu%2ficpsrweb%2findex.jsp

我尝试添加Sys.sleep(1)和各种httr::GET/httr::POST调用,但是没有任何效果。

my_download <-
    function( your_email , your_password ){

        values <-
            list(
                agree = "yes",
                path = "ICPSR" ,
                study = "21600" ,
                ds = "" ,
                bundle = "rdata",
                dups = "yes",
                email=your_email,
                password=your_password
            )


        httr::POST("https://www.icpsr.umich.edu/cgi-bin/terms", body = values)
        httr::POST("https://www.icpsr.umich.edu/rpxlogin", body = values)

        tf <- tempfile()
        httr::GET(
            "https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2" ,
            query = values ,
            httr::write_disk( tf , overwrite = TRUE ) ,
            httr::progress()
        )

    }

# fails
my_download( "[email protected]" , "some_password" )

# stepping through works
debug( my_download )
my_download( "[email protected]" , "some_password" )

EDIT 失败仅下载此页面,就像未登录(而非数据集)一样,因此由于某种原因它丢失了身份验证。如果您已登录icpsr,请使用私有(private)浏览来查看页面-

https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?study=21600&ds=1&bundle=rdata&path=ICPSR

谢谢!

最佳答案

之所以会发生这种事情,是因为httr包的状态(例如cookie)存储在每个URL的handle中(请参阅?handle)。

在这种特殊情况下,尚不清楚究竟是什么使其真正起作用,但是一种策略是在对数据进行身份验证和请求之前,将对GET的请求包括在https://www.icpsr.umich.edu/cgi-bin/bob/中。例如,

my_download <-
    function( your_email , your_password ){
        ## for some reason this is required ...
        httr::GET("https://www.icpsr.umich.edu/cgi-bin/bob/")
        values <-
            list(
                agree = "yes",
                path = "ICPSR" ,
                study = "21600" ,
                ds = "" ,
                bundle = "rdata",
                dups = "yes",
                email=your_email,
                password=your_password
            )
        httr::POST("https://www.icpsr.umich.edu/rpxlogin", body = values)
        httr::POST("https://www.icpsr.umich.edu/cgi-bin/terms", body = values)
        tf <- tempfile()
        httr::GET(
            "https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2" ,
            query = values ,
            httr::write_disk( tf , overwrite = TRUE ) ,
            httr::progress()
        )
    }

尽管仍不清楚对https://www.icpsr.umich.edu/cgi-bin/bob/`的GET请求到底做了什么或为什么需要它,但它似乎可以正常工作。

08-19 16:03