问题描述
我正在尝试通过以下链接下载数据集.它大约有 14,000,000 行长.我运行了此代码块,并被困在unzip()上.该代码已经运行了很长时间,并且我的计算机很热.
I am trying to download the dataset at the below link. It is about 14,000,000 rows long.I ran this code chunk, and I am stuck at unzip(). The code has been running for a really long time and my computer is hot.
我尝试了几种不使用unzip的方法,然后陷入了read.csv/vroom/read_csv步骤.有任何想法吗?这是一个公共数据集,因此任何人都可以尝试.
I tried a few different ways that don't use unzip, and then I get stuck at the read.csv/vroom/read_csv step.Any ideas? This is a public dataset so anyone can try.
library(vroom)
temp <- tempfile()
download.file("https://files.consumerfinance.gov/hmda-historic-loan-data/hmda_2017_nationwide_all-records_labels.zip", temp)
unzip(temp, "hmda_2017_nationwide_all-records_labels.csv")
df2017 <- vroom("hmda_2017_nationwide_all-records_labels.csv")
unlink(temp)
推荐答案
我能够先将文件下载到计算机上.
然后使用vroom( https://vroom.r-lib.org/)加载它而无需解压缩它:
I was able to download the file to my computer first.
then use vroom (https://vroom.r-lib.org/) to load it without unzipping it:
library(vroom)
df2017 <- vroom("hmda_2017_nationwide_all-records_labels.zip")
我收到有关可能被截断的警告,但是对象具有以下尺寸:
I get a warning about possible truncation, but the object has these dimensions:
> dim(df2017)
[1] 5448288 78
关于vroom的一件好事是,它不会将数据直接加载到内存中.
one nice thing about vroom, is that it doesn't load the data straight into memory.
这篇关于从zip文件下载非常大(1400万行)csv的快速方法?解压缩,read_csv和read.csv永远不会停止加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!