本文介绍了在R数据表中保持第一行乘以多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我只想从一个data.table中获取第一行,这是很简单的,例如:
/ p>
(dt y = c(1,1,2,2),
z = c(1,2,1,2)))
#xyz
#| 1:1 1 1
# | 2:1 1 2
#| 3:1 2 1
#| 4:2 2 2
dt [!duplicateated(x)]#删除行2-3
#xyz
#| 1:1 1 1
#| 2:2 2 2
但是,当尝试基于两列删除时,这些方法都不工作;即在这种情况下只删除行2:
dt [!duplicateated(x,y)]#只保留原始数据集
#xyz
#| 1:1 1 1
#| 2:1 1 2
#| 3:1 2 1
#| 4:2 2 2
dt [!duplicate(list(x,y))]#和上面的一样
dt [!重复的(c(x,y)) [!duplicateated(list(x,y))]#与上面的一样
dt [!duplicateated(c(x,y))]#只从第一列中删除重复的项
#xyz
#| 1:1 1 1
#| 2:2 2 2
除此之外,它只在某些情况下有效:
dt [!重复(paste0(x,y))]
/ pre>
#xyz
#| 1:1 1 1
#| 2:1 2 1
#| 3:2 2 2
解决方案
data.table
$ c> unique ,重复
和anyDuplicated
unique(dt,by = c('x','y'))
会给你想要的。
I'd like to get the first row only from a data.table, grouped by multiple columns.
This is straightforward with a single column, e.g.:
(dt <- data.table(x = c(1, 1, 1, 2), y = c(1, 1, 2, 2), z = c(1, 2, 1, 2))) # x y z # |1: 1 1 1 # |2: 1 1 2 # |3: 1 2 1 # |4: 2 2 2 dt[!duplicated(x)] # Remove rows 2-3 # x y z # |1: 1 1 1 # |2: 2 2 2
But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:
dt[!duplicated(x, y)] # Keeps only original data set # x y z # |1: 1 1 1 # |2: 1 1 2 # |3: 1 2 1 # |4: 2 2 2 dt[!duplicated(list(x, y))] # Same as above dt[!duplicated(c("x", "y"))] # Same as above dt[!duplicated(list("x", "y"))] # Same as above dt[!duplicated(c(x, y))] # Only removes duplicates from first column # x y z # |1: 1 1 1 # |2: 2 2 2
Except for this, which only works in certain cases:
dt[!duplicated(paste0(x, y))] # x y z # |1: 1 1 1 # |2: 1 2 1 # |3: 2 2 2
解决方案
data.table
provides S3 methods forunique
,duplicated
andanyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
这篇关于在R数据表中保持第一行乘以多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!