如何绑定data.table而不增加内存消耗？

本文介绍了如何绑定data.table而不增加内存消耗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有几个巨大的数据类型 dt_1，dt_2，...，dt_N 具有相同的cols。我想将它们绑定到一个 datatable 。如果我使用

I have few huge datatable dt_1, dt_2, ..., dt_N with same cols. I want to bind them together into a single datatable. If I use

dt <- rbind(dt_1, dt_2, ..., dt_N)

或

dt <- rbindlist(list(dt_1, dt_2, ..., dt_N))

大约是 dt_1，dt_2，...，dt_N 所需的金额的两倍。有没有办法绑定他们没有显着增加内存消耗？注意，一旦它们组合在一起，我不需要 dt_1，dt_2，...，dt_N 。

then the memory usage is approximately double the amount needed for dt_1,dt_2,...,dt_N. Is there a way to bind them wihout increasing the memory consumption significantly? Note that I do not need dt_1, dt_2, ..., dt_N once they are combined together.

推荐答案

其他方法，使用临时文件'bind'：

Other approach, using a temporary file to 'bind':

nobs=10000
d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
ll<-c('d1','d2','d3')
tmp<-tempfile()

# Write all, writing header only for the first one
for(i in seq_along(ll)) {
  write.table(get(ll[i]),tmp,append=(i!=1),row.names=FALSE,col.names=(i==1))
}

# 'Cleanup' the original objects from memory (should be done by the gc if needed when loading the file
rm(list=ll)

# Read the file in the new object
dt<-fread(tmp)

# Remove the file
unlink(tmp)

显然比 rbind 方法慢，有内存争用，这将不会比要求系统换出内存页面更慢。

Obviously slower than the rbind method, but if you have memory contention, this won't be slower than requiring the system to swap out memory pages.

当然，如果你的orignal对象从文件首先加载，更喜欢连接文件加载到R之前用另一个工具最大目标是处理文件（cat，awk等）

Of course if your orignal objects are loaded from file at first, prefer concatenating the files before loading in R with another tool most aimed at working with files (cat, awk, etc.)

这篇关于如何绑定data.table而不增加内存消耗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..