问题描述
我是初学者,在抓取时遇到问题.
I'm a beginner and I have a problem with scraping.
我需要为一些客户端获取有关活动/非活动 VEIS 编号的数据.现在,我只尝试一个.在网站上,我必须:设置值并发送表单,然后浏览器重定向到下一页,在那里我可以找到一个有趣的日期.
I need to get data about the active/inactive VEIS number for a few clients.For now, I trying for only one.On the website, I have to: set values and sending the form, after that the browser redirects to the next page, where I can find an interesting date.
下面我发送了我的代码.也许有人可以提供帮助.
Below I sent my code.Maybe someone can help.
library(rvest)
library(XML)
url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html?
locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1
date <- set_values(form1[[1]], requesterMemberStateCode = "AT-
Austria",requesterNumber = "4324")
date
set <- submit_form(session = session1,form = date)
推荐答案
首先你不需要XML
包,rvest
就足够了.
First of all you don't need the XML
package, rvest
is enough.
您的表单提交部分几乎正确,只是输入了错误的字段名称.
You had the form submitting part almost right, you just put in wrong field names.
library(rvest)
#> Loading required package: xml2
url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html?locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1[[1]]
#> <form> 'vowRequest' (POST vatResponse.html)
#> <select> 'memberStateCode' [0/29]
#> <input text> '': --
#> <input text> 'number':
#> <input text> 'traderName':
#> <select> 'traderCompanyType' [0/0]
#> <input text> 'traderStreet':
#> <input text> 'traderPostalCode':
#> <input text> 'traderCity':
#> <select> 'requesterMemberStateCode' [0/30]
#> <input text> '':
#> <input text> 'requesterNumber':
#> <input hidden> 'action': check
#> <input submit> 'check': Weryfikuj
date <- set_values(form1[[1]], memberStateCode = "AT", number = "4324")
set <- submit_form(session = session1,form = date)
#> Submitting with 'NULL'
之后,提取您感兴趣的值就很容易了:
After that, extracting the values you are interested in it's easy:
set %>%
read_html() %>%
html_table(fill = TRUE) %>%
purrr::pluck(1) %>%
dplyr::slice(4:n()) %>%
dplyr::select(1:2)
#> # A tibble: 6 x 2
#> X1 X2
#> <chr> <chr>
#> 1 Państwo Członkowskie AT
#> 2 Numer VAT AT 4324
#> 3 Data zapytania 2018/05/17 14:33:10
#> 4 Nazwa ---
#> 5 Adres ---
#> 6 Identyfikator zapytania ""
由 reprex 包 (v0.2.0) 于 2018 年 5 月 17 日创建.
Created on 2018-05-17 by the reprex package (v0.2.0).
这篇关于Rvest XML 网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!