问题描述
我想下载一个在open-plaques-all-2017-06-19.rar中压缩的文件,但未能在R中实现.请在下面查看我的代码
I wanted to download a file that zipped in open-plaques-all-2017-06-19.rar, but failed to implement it in R. Please have a look at my code below
temp <- tempfile()
download.file("https://github.com/tuyenhavan/Statistics/blob/master/open-plaques-all-2017-06-19.rar", temp)
df<- fread(unzip(temp, files = "open-plaques-all-2017-06-19.csv"))
head(df)
推荐答案
对于这些相应的平台/pkg管理器,您将需要:
For these respective platforms/pkg managers you'll need:
- deb:libarchive-dev(Debian,Ubuntu等)
- rpm:libarchive-devel(Fedora,CentOS,RHEL)
- csw:libarchive_dev(Solaris)
- 酿造:libarchive(Mac OSX)
Windows人员将为其自动下载预编译的二进制文件.
Windows folks will have precompiled binaries auto-downloaded for them.
然后做:
devtools::install_github("jimhester/archive")
这是一个工作流程.现在,您指定的网址不正确/无效.您需要使用原始" URL来获取实际文件.
Here's one workflow. Now that the URL you specified was not correct/valid. You need to use the "raw" URL to get to the actual file.
library(archive)
tf1 <- tempfile(fileext = ".rar")
download.file("https://github.com/tuyenhavan/Statistics/blob/master/open-plaques-all-2017-06-19.rar?raw=true", tf1)
tf2 <- tempfile()
archive_extract(tf1, tf2)
list.files(tf2)
## [1] "open-plaques-all-2017-06-19.csv"
file.size(file.path(tf2, list.files(tf2)))
## [1] 26942816
xdf <- readr::read_csv(file.path(tf2, list.files(tf2)))
dplyr::glimpse(xdf)
## Observations: 38,436
## Variables: 27
## $ id <int> 29923, 42945, 42944, 42943, 42942, 42941, 42940, ...
## $ title <chr> "Jon Pertwee blue plaque", "Apsley Cherry-Garrard...
## $ inscription <chr> "Jon Pertwee 1919-1996 Doctor Who 1970-1974", "Ap...
## $ latitude <dbl> NA, NA, NA, NA, NA, NA, 54.14910, 45.76330, NA, 4...
## $ longitude <dbl> NA, NA, NA, NA, NA, NA, -4.46938, 4.83157, NA, 4....
## $ country <chr> "United Kingdom", "United Kingdom", "United Kingd...
## $ area <chr> "London", "Bedford", "Harlow", "Bozen", "Adro", "...
## $ address <chr> "BBC Television Centre", "Lansdowne Road", "The W...
## $ erected <int> NA, NA, NA, NA, NA, 2016, NA, NA, NA, NA, NA, NA,...
## $ main_photo <chr> NA, "https://commons.wikimedia.org/wiki/Special:F...
## $ colour <chr> "blue", "blue", "blue", "brass", "brass", "brass"...
## $ organisations <chr> "[]", "[]", "[\"Harlow Civic Society\"]", "[\"Gun...
## $ language <chr> "English", "English", "English", "Italian", "Ital...
## $ series <chr> NA, NA, NA, "Stolpersteine Italiano", "Stolperste...
## $ series_ref <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ `geolocated?` <chr> "false", "false", "false", "false", "false", "fal...
## $ `photographed?` <chr> "false", "true", "false", "true", "true", "true",...
## $ number_of_subjects <int> 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0...
## $ lead_subject_name <chr> "Jon Pertwee", "Apsley Cherry-Garrard", NA, NA, "...
## $ lead_subject_born_in <int> 1919, 1886, NA, NA, 1911, 1913, NA, 1888, 1832, 1...
## $ lead_subject_died_in <int> 1996, 1959, NA, NA, 1945, 1945, NA, 1967, 1898, 1...
## $ lead_subject_type <chr> "man", "man", NA, NA, "man", "man", NA, "man", "m...
## $ lead_subject_roles <chr> "[\"Doctor Who\", \"actor\", \"entertainer\", \"t...
## $ lead_subject_wikipedia <chr> "https://en.wikipedia.org/wiki/Jon_Pertwee", "htt...
## $ lead_subject_dbpedia <chr> "http://dbpedia.org/resource/Jon_Pertwee", "http:...
## $ lead_subject_image <chr> "https://commons.wikimedia.org/wiki/Special:FileP...
## $ subjects <chr> "[\"Jon Pertwee|(1919-1996)|man|Doctor Who, actor...
考虑unlink()
设置tf1
,将文件从tf2
复制到更永久的位置,然后在工作完成后unlink()
设置tf2
进行清理.
Consider unlink()
ing tf1
, copying the file(s) from tf2
somewhere more permanent and then unlink()
ing tf2
to clean up after the work is completed.
这篇关于如何直接从R中的网站读取file.rar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!