本文介绍了从zip文件下载非常大(1400万行)csv的快速方法?解压缩,read_csv和read.csv永远不会停止加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过以下链接下载数据集.它大约有 14,000,000 行长.我运行了此代码块,并被困在unzip()上.该代码已经运行了很长时间,并且我的计算机很热.

I am trying to download the dataset at the below link. It is about 14,000,000 rows long.I ran this code chunk, and I am stuck at unzip(). The code has been running for a really long time and my computer is hot.

我尝试了几种不使用unzip的方法,然后陷入了read.csv/vroom/read_csv步骤.有任何想法吗?这是一个公共数据集,因此任何人都可以尝试.

I tried a few different ways that don't use unzip, and then I get stuck at the read.csv/vroom/read_csv step.Any ideas? This is a public dataset so anyone can try.

library(vroom)

temp <- tempfile()
download.file("https://files.consumerfinance.gov/hmda-historic-loan-data/hmda_2017_nationwide_all-records_labels.zip", temp)


unzip(temp, "hmda_2017_nationwide_all-records_labels.csv")


df2017 <- vroom("hmda_2017_nationwide_all-records_labels.csv")

unlink(temp)

推荐答案

我能够先将文件下载到计算机上.
然后使用vroom( https://vroom.r-lib.org/)加载它而无需解压缩它:

I was able to download the file to my computer first.
then use vroom (https://vroom.r-lib.org/) to load it without unzipping it:

library(vroom)
df2017 <- vroom("hmda_2017_nationwide_all-records_labels.zip")

我收到有关可能被截断的警告,但是对象具有以下尺寸:

I get a warning about possible truncation, but the object has these dimensions:

> dim(df2017)
[1] 5448288      78

关于vroom的一件好事是,它不会将数据直接加载到内存中.

one nice thing about vroom, is that it doesn't load the data straight into memory.

这篇关于从zip文件下载非常大(1400万行)csv的快速方法?解压缩,read_csv和read.csv永远不会停止加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 18:53