本文介绍了在R数据表中保持第一行乘以多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我只想从一个data.table中获取第一行,这是很简单的,例如:

/ p>

 (dt y = c(1,1,2,2),
z = c(1,2,1,2)))
#xyz
#| 1:1 1 1
# | 2:1 1 2
#| 3:1 2 1
#| 4:2 2 2
dt [!duplicateated(x)]#删除行2-3
#xyz
#| 1:1 1 1
#| 2:2 2 2

但是,当尝试基于两列删除时,这些方法都不工作;即在这种情况下只删除行2:

  dt [!duplicateated(x,y)]#只保留原始数据集
#xyz
#| 1:1 1 1
#| 2:1 1 2
#| 3:1 2 1
#| 4:2 2 2
dt [!duplicate(list(x,y))]#和上面的一样
dt [!重复的(c(x,y)) [!duplicateated(list(x,y))]#与上面的一样
dt [!duplicateated(c(x,y))]#只从第一列中删除重复的项
#xyz
#| 1:1 1 1
#| 2:2 2 2

除此之外,它只在某些情况下有效:

  dt [!重复(paste0(x,y))] 
#xyz
#| 1:1 1 1
#| 2:1 2 1
#| 3:2 2 2
/ pre>

解决方案

data.table $ c> unique ,重复 anyDuplicated

  unique(dt,by = c('x','y'))

会给你想要的。


I'd like to get the first row only from a data.table, grouped by multiple columns.

This is straightforward with a single column, e.g.:

(dt <- data.table(x = c(1, 1, 1, 2),
                  y = c(1, 1, 2, 2),
                  z = c(1, 2, 1, 2)))
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
#     x y z
# |1: 1 1 1
# |2: 2 2 2

But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:

dt[!duplicated(x, y)] # Keeps only original data set
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
#     x y z
# |1: 1 1 1
# |2: 2 2 2

Except for this, which only works in certain cases:

dt[!duplicated(paste0(x, y))]
#     x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
解决方案

data.table provides S3 methods for unique, duplicated and anyDuplicated

unique(dt, by = c('x','y'))

will give you what you want.

这篇关于在R数据表中保持第一行乘以多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-14 05:47