本文介绍了R阅读巨大的CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的csv文件.它的大小约为9 GB.我有16 GB的RAM.我遵循了页面中的建议,并实施了这些建议以下.

I have a huge csv file. Its size is around 9 gb. I have 16 gb of ram. I followed the advises from the page and implemented them below.

If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field:
--max-vsize=500M

仍然出现以下错误和警告.我应该如何将9 GB的文件读入我的R中?我有R 64位3.3.1,并且在rstudio 0.99.903中的命令下运行.我有Windows Server 2012 R2标准版,64位操作系统.

Still I am getting the error and warnings below. How should I read the file of 9 gb into my R? I have R 64 bit 3.3.1 and I am running below command in the rstudio 0.99.903. I have windows server 2012 r2 standard, 64 bit os.

> memory.limit()
[1] 16383
> answer=read.csv("C:/Users/a-vs/results_20160291.csv")
Error: cannot allocate vector of size 500.0 Mb
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
2: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
3: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
4: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
5: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
6: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
7: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
8: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
9: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
10: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
11: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)
12: In scan(file = file, what = what, sep = sep, quote = quote,  ... :
  Reached total allocation of 16383Mb: see help(memory.size)

------------------- Update1

我第一次尝试根据建议的答案

------------------- Update1

My 1st try based upon suggested answer

> thefile=fread("C:/Users/a-vs/results_20160291.csv", header = T)
Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34
Warning messages:
1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",  :
  Reached total allocation of 16383Mb: see help(memory.size)
2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",  :
  Reached total allocation of 16383Mb: see help(memory.size)

------------------- Update2

我根据建议答案的第二次尝试如下

------------------- Update2

my 2nd try based upon suggested answer is as below

thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv", header=TRUE, VERBOSE=TRUE,
+                    first.rows=-1, next.rows=50000, colClasses=NA)
read.table.ffdf 1..
Error: cannot allocate vector of size 125.0 Mb
In addition: There were 14 warnings (use warnings() to see them)

我怎样才能将此文件读入一个对象中,以便一次性分析所有数据

How could I read this file into a single object so that I can analyze the entire data in one go

我们买了一台昂贵的机器.它具有10个内核和256 gb的ram.那不是最有效的解决方案,但至少在不久的将来会奏效.我查看了以下答案,但我认为它们不能解决我的问题:(我感谢这些答案.我想进行市场分析,我认为没有其他方法可以将数据保存在RAM中

We bought an expensive machine. It has 10 cores and 256 gb ram. That is not the most efficient solution but it works at least in near future. I looked at below answers and I dont think they solve my problem :( I appreciate these answers. I want to perform the market basket analysis and I dont think there is no other way around rather than keeping my data in RAM

推荐答案

请确保您使用的是64位R,而不仅仅是64位Windows,以便可以将RAM分配增加到全部16 GB.

Make sure you're using 64-bit R, not just 64-bit Windows, so that you can increase your RAM allocation to all 16 GB.

此外,您可以分块读取文件:

In addition, you can read in the file in chunks:

file_in    <- file("in.csv","r")
chunk_size <- 100000 # choose the best size for you
x          <- readLines(file_in, n=chunk_size)

您可以使用data.table更有效地处理大型文件的读取和处理:

You can use data.table to handle reading and manipulating large files more efficiently:

require(data.table)
fread("in.csv", header = T)

如果需要,您可以使用ff来利用存储内存:

If needed, you can leverage storage memory with ff:

library("ff")
x <- read.csv.ffdf(file="file.csv", header=TRUE, VERBOSE=TRUE,
                   first.rows=10000, next.rows=50000, colClasses=NA)

这篇关于R阅读巨大的CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 19:30