数据表中的内存限制:不允许负长度向量

本文介绍了数据表中的内存限制:不允许负长度向量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含几个社交媒体用户及其关注者的数据表.原始数据表具有以下格式:

I have a data table with several social media users and his/her followers. The original data table has the following format:

X.USERID FOLLOWERS
1081     4053807021,2476584389,4713715543, ...

因此，每一行都包含一个用户以及他/她的ID和一个追随者矢量(用逗号分隔).我总共拥有24,000个唯一的用户ID和160,000,000个唯一的关注者.我希望将原始表转换为以下格式:

So each row contains a user together with his/her ID and a vector of followers (seperated by a comma). In total I have 24,000 unique user IDs together with 160,000,000 unique followers. I wish to convert my original table in the following format:

X.USERID          FOLLOWERS
1:     1081         4053807021
2:     1081         2476584389
3:     1081         4713715543
4:     1081          580410695
5:     1081         4827723557
6:     1081 704326016165142528

为了获得此数据表，我使用了以下代码行(假设我的原始数据表称为dt):

In order to get this data table I used the following line of code (assume that my original data table is called dt):

uf <- dt[,list(FOLLOWERS = unlist(strsplit(x = FOLLOWERS, split= ','))), by = X.USERID]

但是，当我在整个数据集上运行此代码时，出现以下错误:

However when I run this code on the entire dataset I get the following error:

不允许负长度向量

根据堆栈溢出的这篇文章(后，data.table中的行数为负数)，看来我正碰到data.table中该列的内存限制.作为一种解决方法，我以较小的块(每10,000个)运行代码，这似乎可行.

According to this post on stack overflow (Negative number of rows in data.table after incorrect use of set ), it seems that I am bumping into the memory limits of the column in data.table. As a workaround, I ran the code in smaller blocks (per 10,000) and this seemed to work.

我的问题是:如果更改代码，是否可以防止发生此错误，或者我是否碰到了R的限制?

My question is: if I change my code can I prevent this error from occuring or am I bumping into the limits of R?

PS.我有一台可以使用140gb RAM的计算机，因此物理内存空间不应该成为问题.

PS. I have a machine with 140gb RAM at my disposal, so physical memory space should not be the issue.

> memory.limit()
[1] 147446

chunks

数据表中的内存限制:不允许负长度向量

问题描述

推荐答案