中使用表单输入进行

中使用表单输入进行

本文介绍了在 R 中使用表单输入进行 rvest Webscraping的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法解决 R 中的这个问题,如果您能在这里给我一些建议,我将不胜感激.

I can't get my head around this problem in R and I would really appreciate if you could leave a piece of advice for me here.

我正在尝试从 https://www.investing.com/rates-bonds/spain-5-year-bond-yield-historical-data 仅供个人使用(当然).

I am trying to scrape historical bond yield data from https://www.investing.com/rates-bonds/spain-5-year-bond-yield-historical-data for personal use only (of course).

此处提供的解决方案非常有效,但只能抓取每日数据的前 24 个时间戳:从网页抓取数据表和数据

The solution provided here works really well but only goes as far as to scrape the first 24 time stamps of daily data:webscraping data tables and data from a web page

我想要实现的是更改日期范围以获取更多历史数据.基于 SelectorGadget 工具,日期范围的输入表单 id 称为 //*[(@id = "widgetFieldDateRange")]

What I am trying to achieve is to change the date range in order to scrape more historical data.Based on the SelectorGadget tool, the input form id for the date range is called //*[(@id = "widgetFieldDateRange")]

我也尝试使用以下代码行来更改日期值但没有成功:

I have also tried using the following lines of code to change the date values but without success:

library(rvest)

url1 <- "https://www.investing.com/rates-bonds/spain-5-year-bond-yield-historical-data" #Spain 5yr yield

session <- html_session(url1)
pgform <- html_form(session)[[1]]

pgform$fields[[3]]$value <- "01/01/2010 - 09/10/2020"
result <- submit_form(session, pgform)

问题:知道如何正确提交新日期范围并检索扩展时间序列吗?

非常感谢您的帮助!

PS:不幸的是,URL 不会根据日期范围而改变.

PS: Unfortunately, the URL does not change based on the date range.

推荐答案

可以直接执行POST请求:

You can perform the POST request directly :

POST https://www.investing.com/instruments/HistoricalDataAjax

您需要从页面中抓取一些请求中必需的信息:

You need to scrape a few information from the page that are necessary in the request :

  • 来自 div 标签的 pair_ids 属性
  • 来自 .instrumentHeader 类中的 h2 标签的标头值
  • the pair_ids attribute from a div tag
  • the header value from h2 tag inside .instrumentHeader class

完整代码:

library(rvest)
library(httr)

startDate <- as.Date("2020-06-01")
endDate <- Sys.Date() #today

userAgent <- "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
mainUrl <- "https://www.investing.com/rates-bonds/spain-5-year-bond-yield-historical-data"

s <- html_session(mainUrl)

pair_ids <- s %>%
    html_nodes("div[pair_ids]") %>%
    html_attr("pair_ids")

header <- s %>% html_nodes(".instrumentHeader h2") %>% html_text()

resp <- s %>% rvest:::request_POST(
    "https://www.investing.com/instruments/HistoricalDataAjax",
    add_headers('X-Requested-With'= 'XMLHttpRequest'),
    user_agent(userAgent),
    body = list(
        curr_id = pair_ids,
        header = header[[1]],
        st_date = format(startDate, format="%m/%d/%Y"),
        end_date = format(endDate, format="%m/%d/%Y"),
        interval_sec = "Daily",
        sort_col = "date",
        sort_ord = "DESC",
        action = "historical_data"
    ),
    encode = "form") %>%
    html_table

print(resp[[1]])

输出:

            Date  Price   Open   High    Low Change %
1   Oct 09, 2020 -0.339 -0.338 -0.333 -0.361    2.42%
2   Oct 08, 2020 -0.331 -0.306 -0.306 -0.338    7.47%
3   Oct 07, 2020 -0.308 -0.323 -0.300 -0.324   -0.65%
4   Oct 06, 2020 -0.310 -0.288 -0.278 -0.319    7.27%
5   Oct 05, 2020 -0.289 -0.323 -0.278 -0.331  -10.39%
6   Oct 03, 2020 -0.322 -0.322 -0.322 -0.322    1.42%
7   Oct 02, 2020 -0.318 -0.311 -0.302 -0.320    5.65%
.....................................................
.....................................................
96  Jun 08, 2020 -0.162 -0.152 -0.133 -0.173   13.29%
97  Jun 05, 2020 -0.143 -0.129 -0.127 -0.154   13.49%
98  Jun 04, 2020 -0.126 -0.089 -0.063 -0.148   38.46%
99  Jun 03, 2020 -0.091 -0.120 -0.087 -0.128  -35.00%
100 Jun 02, 2020 -0.140 -0.148 -0.137 -0.166   14.75%
101 Jun 01, 2020 -0.122 -0.140 -0.101 -0.150  -17.57%

这也适用于任何页面,如果您替换 mainUrl 变量的值,例如 这个

This also works for any page if you replace the value of mainUrl variable for instance this one

这篇关于在 R 中使用表单输入进行 rvest Webscraping的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:28