问题描述
我正在尝试通过 R 从安全站点下载 png
图像.
I'm attempting to download a png
image from a secure site through R.
为了访问安全站点,我使用了运行良好的 Rvest
.
To access the secure site I used Rvest
which worked well.
到目前为止,我已经提取了 png
图像的 URL.
So far I've extracted the URL for the png
image.
如何使用 rvest 下载此链接的图片?
How can I download the image of this link using rvest?
rvest
函数之外的函数由于没有权限返回错误.
Functions outside of the rvest
function return errors due to not having permission.
library(rvest)
uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session <- html_session("https://url.png", user_agent(uastring))
form <- html_form(session)[[1]]
form <- set_values(form, username = "***", password="***", cookie_checkbox= TRUE)
session<-submit_form(session, form)
session2<-jump_to(session, "https://url.png")
## Status 200 using rvest, sucessfully accsessed page.
session
<session> https://url.png
Status: 200
Type: image/png
Size: 438935
## Using download.file returns status 403, page unable to open.
download.file("https://url.png", destfile = "t.png")
cannot open: HTTP status was '403 Forbidden'
在 url 上尝试了 readPNG
和 download.file
,这两个都失败了,因为没有从经过身份验证的安全站点下载的权限(错误:403),这就是为什么我首先使用 rvest.
Have tried readPNG
and download.file
on the url, both of which failed due to not having permission to download from a authenticated secure site (error: 403), hence why I used rvest in the first place.
推荐答案
这是一个将 R 徽标下载到当前目录的示例.
Here's one example to download the R logo into the current directory.
library(rvest)
url <- "https://www.r-project.org"
imgsrc <- read_html(url) %>%
html_node(xpath = '//*/img') %>%
html_attr('src')
imgsrc
# [1] "/Rlogo.png"
# side-effect!
download.file(paste0(url, imgsrc), destfile = basename(imgsrc))
编辑
由于涉及到身份验证,Austin 的使用会话的建议当然是必需的.试试这个:
Since authentication is involved, Austin's suggestion of using a session is certainly required. Try this:
library(rvest)
library(httr)
sess <- html_session(url)
imgsrc <- sess %>%
read_html() %>%
html_node(xpath = '//*/img') %>%
html_attr('src')
img <- jump_to(sess, paste0(url, imgsrc))
# side-effect!
writeBin(img$response$content, basename(imgsrc))
这篇关于R: 使用 rvest 下载图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!