本文介绍了用一个值替换 R data.table 中的所有缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


如果你有一个 R data.table 有缺失值,你如何将它们全部替换为值 0?例如

If you have an R data.table that has missing values, how do you replace all of them with say, the value 0? E.g.

aa = data.table(V1=1:10,V2=c(1,2,2,3,3,3,4,4,4,4))
bb = data.table(V1=3:6,X=letters[1:4])
tt = bb[aa]

    V1  X V2
 1:  1 NA  1
 2:  2 NA  2
 3:  3  a  2
 4:  4  b  3
 5:  5  c  3
 6:  6  d  3
 7:  7 NA  4
 8:  8 NA  4
 9:  9 NA  4
10: 10 NA  4


Any way to do this in one line? If it were just a matrix, you could just do:

tt[is.na(tt)] = 0


is.na(作为一个原语)具有相对较少的开销并且通常非常快.因此,您可以遍历列并使用 setNA 替换为0`.

is.na (being a primitive) has relatively very less overhead and is usually quite fast. So, you can just loop through the columns and use set to replace NA with0`.

使用 <- 进行分配将导致 all 列的副本,这不是使用 data.table 的惯用方式.

Using <- to assign will result in a copy of all the columns and this is not the idiomatic way using data.table.


First I'll illustrate as to how to do it and then show how slow this can get on huge data (due to the copy):

for (i in seq_along(tt)) set(tt, i=which(is.na(tt[[i]])), j=i, value=0)


You'll get a warning here that "0" is being coerced to character to match the type of column. You can ignore it.

# by reference - idiomatic way
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
# modifies value by reference - no copy
for (i in seq_along(tt))
    set(tt, i=which(is.na(tt[[i]])), j=i, value=0)
#   user  system elapsed
#  0.284   0.083   0.386

# by copy - NOT the idiomatic way
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
# makes copy
system.time({tt[is.na(tt)] <- 0})
# a bunch of "tracemem" output showing the copies being made
#   user  system elapsed
#  4.110   0.976   5.187

这篇关于用一个值替换 R data.table 中的所有缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 03:19