内存高效的替代rbind - 就地rbind？

本文介绍了内存高效的替代rbind - 就地rbind？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要rbind两个大数据帧。现在我使用

  df<  -  rbind（df，df.extension）

但我（几乎）瞬间耗尽内存。我猜它是因为df被记录在内存中两次。我将来可能会看到更大的数据帧，所以我需要一些就地的rbind。

所以我的问题是：有没有办法在使用rbind时避免内存中的数据重复？

我发现这个问题，它使用SqlLite，但我真的希望避免使用硬盘作为缓存。

解决方案

以下解决方案：

  nextrow = nrow（df）+1 
 df [nextrow：（nextrow + nrow（df .extension）-1），] = df.extension 
＃我们需要确保唯一的行名称
 row.names（df）= 1：nrow（df）

现在我没有内存不足。我认为它是因为我存储

  object.size（df）+ 2 * object.size（df.extension）

而rbind R需要

  object.size（rbind（df，df.extension））+ object.size（df）+ object.size（df.extension）。

之后我使用

  rm（df.extension）
 gc（reset = TRUE）

释放我不再需要的内存

这解决了我现在的问题，但我觉得有更高级的方法来做一个高效的内存记录。我感谢对这个解决方案的任何意见。

I need to rbind two large data frames. Right now I use

df <- rbind(df, df.extension)

but I (almost) instantly run out of memory. I guess its because df is held in the memory twice. I might see even bigger data frames in the future, so I need some kind of in-place rbind.

So my question is: Is there a way to avoid data duplication in memory when using rbind?

I found this question, which uses SqlLite, but I really want to avoid using the hard drive as a cache.

解决方案

Right now I worked out the following solution:

nextrow = nrow(df)+1
df[nextrow:(nextrow+nrow(df.extension)-1),] = df.extension
# we need to assure unique row names
row.names(df) = 1:nrow(df)

Now I don't run out of memory. I think its because I store

object.size(df) + 2 * object.size(df.extension)

while with rbind R would need

object.size(rbind(df,df.extension)) + object.size(df) + object.size(df.extension).

After that I use

rm(df.extension)
gc(reset=TRUE)

to free the memory I don't need anymore.

This solved my problem for now, but I feel that there is a more advanced way to do a memory efficient rbind. I appreciate any comments on this solution.

这篇关于内存高效的替代rbind - 就地rbind？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！