使用rvest下载图片

使用rvest下载图片

本文介绍了R:使用rvest下载图片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过R从安全站点下载 png 图像。

I'm attempting to download a png image from a secure site through R.

要访问安全站点,我使用了 Rvest ,该站点运行良好。

To access the secure site I used Rvest which worked well.

到目前为止,我已经提取了<$ c的URL $ c> png 图片。

So far I've extracted the URL for the png image.

如何使用rvest下载此链接的图像?

How can I download the image of this link using rvest?

<$ c $之外的功能c> rvest 函数由于没有权限而返回错误。

Functions outside of the rvest function return errors due to not having permission.

library(rvest)
uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session <- html_session("https://url.png", user_agent(uastring))
form <- html_form(session)[[1]]
form <- set_values(form, username = "***", password="***", cookie_checkbox= TRUE)
session<-submit_form(session, form)
session2<-jump_to(session, "https://url.png")

## Status 200 using rvest, sucessfully accsessed page.
session
<session> https://url.png
  Status: 200
  Type:   image/png
  Size:   438935

## Using download.file returns status 403, page unable to open.
download.file("https://url.png", destfile = "t.png")
    cannot open: HTTP status was '403 Forbidden'

尝试过 readPNG download.file 上的网址,由于没有权限从经过身份验证的安全站点下载而失败(错误:403),因此都失败了,因此为什么我首先使用rvest。

Have tried readPNG and download.file on the url, both of which failed due to not having permission to download from a authenticated secure site (error: 403), hence why I used rvest in the first place.

推荐答案

这里是将R徽标下载到当前目录的一个示例。

Here's one example to download the R logo into the current directory.

library(rvest)
url <- "https://www.r-project.org"
imgsrc <- read_html(url) %>%
  html_node(xpath = '//*/img') %>%
  html_attr('src')
imgsrc
# [1] "/Rlogo.png"

# side-effect!
download.file(paste0(url, imgsrc), destfile = basename(imgsrc))

编辑

由于涉及身份验证,因此肯定需要Austin建议使用会话。试试这个:

Since authentication is involved, Austin's suggestion of using a session is certainly required. Try this:

library(rvest)
library(httr)
sess <- html_session(url)
imgsrc <- sess %>%
  read_html() %>%
  html_node(xpath = '//*/img') %>%
  html_attr('src')
img <- jump_to(sess, paste0(url, imgsrc))

# side-effect!
writeBin(img$response$content, basename(imgsrc))

这篇关于R:使用rvest下载图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:06