使用data.table来标记组中的第一个（或最后一个）记录

本文介绍了使用data.table来标记组中的第一个（或最后一个）记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！给定一个排序键，是否有一个data.table快捷方式来复制在SAS中找到的第一和最后和SPSS？下面的行人方法标记组的第一个记录。鉴于data.table（我慢慢熟悉）的优雅，我假设有一个快捷方式使用自加入& mult ，但我还是想弄明白。以下是示例： require（data.table） set.seed（123）n< - 17 DT< - data.table（x = sample（letters [1：3]，n，replace = ，y = sample（LETTERS [1：3]，n，replace = T）） sortkey setkeyv key< - paste（DT $ x，DT $ y，sep = - ） nw DT $ first< - 1 * nw DT 解决方案这里有几个解决方案使用 data.table ： ##选项1（clean solution，added 2016-11-29） uDT< - unique（DT） DT [，c（first，最后）：= 0L] DT [uDT，first：= 1L，mult =first] DT [uDT，last：= 1L，mult =last] b $ b ##选项2（原始答案，留作后代） DT DT [DT [ ）,, mult =first，which = TRUE]，first：= 1L] DT [DT [unique（DT）,, mult =last，which = TRUE]，last：= 1L] 头（DT）＃xy第一最后＃[1，] a A 1 1 ＃[2，] a B 1 1 ＃ [3，] a C 1 0 ＃[4，] a C 0 1 ＃[5，] b A 1 1 ＃[6，] b B 1 1 这些行显然有很多。但是，关键结构如下，它返回每个组中第一个记录的行索引： DT [unique （DT）,, mult =first，which = TRUE] ＃[1] 1 2 3 5 6 7 11 13 15 Given a sortkey, is there a data.table shortcut to duplicate the first and last functionalities found in SAS and SPSS ?The pedestrian approach below flags the first record of a group.Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join & mult, but I'm still trying to figure it out.Here's the example:require(data.table)set.seed(123)n <- 17DT <- data.table(x=sample(letters[1:3],n,replace=T), y=sample(LETTERS[1:3],n,replace=T))sortkey <- c("x","y")setkeyv(DT,sortkey)key <- paste(DT$x,DT$y,sep="-")nw <- c( T , key[2:n]!=key[1:(n-1)] )DT$first <- 1*nwDT 解决方案 Here are couple of solutions using data.table:## Option 1 (cleaner solution, added 2016-11-29)uDT <- unique(DT)DT[, c("first","last"):=0L]DT[uDT, first:=1L, mult="first"]DT[uDT, last:=1L, mult="last"]## Option 2 (original answer, retained for posterity)DT <- cbind(DT, first=0L, last=0L)DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L]DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]head(DT)# x y first last# [1,] a A 1 1# [2,] a B 1 1# [3,] a C 1 0# [4,] a C 0 1# [5,] b A 1 1# [6,] b B 1 1There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:DT[unique(DT),,mult="first", which=TRUE]# [1] 1 2 3 5 6 7 11 13 15 这篇关于使用data.table来标记组中的第一个（或最后一个）记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！