我需要分配一个“第二个” ID来对原始id
中的某些值进行分组。这是我的示例数据:
dt<-structure(list(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))),
class = c("data.table", "data.frame"),
.Names = c("id", "period", "date"),
sorted = "id")
> dt
id period date
1: aaaa start 2012-03-02
2: aaaa end 2012-03-05
3: aaas start 2012-08-21
4: aaas end 2013-02-25
5: bbbb start 2012-03-31
6: bbbb end 2013-02-11
需要根据以下列表对列
id
进行分组(使用与说出id2
相同的值):> groups
[[1]]
[1] "aaaa" "aaas"
[[2]]
[1] "bbbb"
我使用了以下代码,该代码似乎可以提供以下
warning
: > dt[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
Warning message:
In `[.data.table`(dt, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table has
been copied by R (or been created manually using structure() or similar). Avoid key<-,
names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use
set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also,
list (DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.
> dt
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
有人可以简要说明此警告的性质以及最终结果(如果有)的任何最终含义吗?谢谢
编辑:
以下代码实际上是在创建
dt
时显示的,以及如何传递给发出警告的函数:f.main <- function(){
f2 <- function(x){
groups <- list(c("aaaa", "aaas"), "bbbb") # actually generated depending on the similarity between values of x$id
x <- x[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
return(x)
}
x <- f1()
if(!is.null(x[["res"]])){
x <- f2(x[["res"]])
return(x)
} else {
# something else
}
}
f1 <- function(){
dt<-data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date")))
return(list(res=dt, other_results=""))
}
> f.main()
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
Warning message:
In `[.data.table`(x, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table
has been copied by R (or been created manually using structure() or similar).
Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr().
Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.
最佳答案
是的,问题出在清单上。这是一个简单的示例:
DT <- data.table(1:5)
mylist1 <- list(DT,"a")
mylist1[[1]][,id:=.I]
#warning
mylist2 <- list(data.table(1:5),"a")
mylist2[[1]][,id:=.I]
#no warning
您应该避免将data.table复制到列表中(出于安全考虑,我完全避免在列表中包含DT)。试试这个:
f1 <- function(){
mylist <- list(res=data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))))
other_results <- ""
mylist$other_results <- other_results
mylist
}