本文介绍了如何在.SD中对data.table执行进一步的分组和查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 在以下示例hflights数据中,我想查找每个Origin机场和Month的最大和最小ArrDelay和相应的UniqueCarrier和Dest。我得到它的工作,但我觉得它可以做得更简单。 对于每个.SD,我可以找到min(ArrDelay)和Max(ArrDelay),但我还需要航空公司和Dest相应的Min和Max ArrDelay。有没有办法执行查找? library(data.table) library(hflights) DT setkey(DT,Origin,Month) DT [,c 。(MaxArrDelay = max(ArrDelay,na.rm = TRUE),Dest),by = UniqueCarrier] [order(-MaxArrDelay)],1), head(.SD [,。(MinArrDelay = min(ArrDelay ,na.rm = TRUE),Dest),by = UniqueCarrier] [order(MinArrDelay)],1)),by =。(Origin,Month)] #输出单个记录... DT [。(HOU,1),。(max(na.omit(ArrDelay)),min(na.omit(ArrDelay))] 解决方案您可以尝试 library(data.table)#v1.9.5 + res1< - DT [,{min1 max1 < - which.max(ArrDelay) list(DestMin = Dest [min1], ArrDelayMin = ArrDelay [min1], MinUniqueCarrier = UniqueCarrier [min1], DestMax = Dest [max1], ArrDelayMax = ArrDelay [max1], MaxUniqueCarrier = UniqueCarrier [max1])}, by =(原始,月)] 或 nm1 res2 by =。(Origin,Month),.SDcols = nm1] setnames(res2,3:ncol(res2),paste0(nm1,rep ,'Max'),each = length(nm1))) all.equal(res1,res2,check.attributes = FALSE)#[1] TRUE 或使用 dplyr library(dplyr) grh< - group_by(hflights,Origin,Month) Min< - grh%>% slice(which.min(ArrDelay))%>% select(Dest,ArrDelay,UniqueCarrier)%>% setNames ,paste0(names。。[3:5],'Min')) Max< - grh%>% slice(which.max(ArrDelay))%>% select(Dest,ArrDelay,UniqueCarrier)%>% setNames(。,c(names。)[1:2],paste0(names。。[3:5] ))) bind_cols(Min,Max [ - (1:2)]) In the following sample hflights data, I would like to find out the max and min ArrDelays and the corresponding UniqueCarrier and the Dest for each Origin airport and Month. I got it to work but i feel it could be made simpler. For each .SD, I can find the min(ArrDelay) and Max(ArrDelay) but I also need the Airline and Dest corresponding to the Min and Max ArrDelay. Is there a way to perform that lookup ? library(data.table)library(hflights)DT <- as.data.table(hflights)setkey(DT, Origin, Month)DT[, c(head(.SD[, .(MaxArrDelay=max(ArrDelay, na.rm = TRUE), Dest) , by=UniqueCarrier][order(-MaxArrDelay)], 1), head(.SD[, .(MinArrDelay=min(ArrDelay, na.rm = TRUE), Dest) , by=UniqueCarrier][order(MinArrDelay)], 1) ), by=.(Origin, Month) ]# Test the output for a single record... DT[ .("HOU", 1), .(max(na.omit(ArrDelay)), min(na.omit(ArrDelay)))] 解决方案 You can try library(data.table)#v1.9.5+ res1 <- DT[, {min1 <- which.min(ArrDelay) max1 <- which.max(ArrDelay) list(DestMin=Dest[min1], ArrDelayMin=ArrDelay[min1], MinUniqueCarrier= UniqueCarrier[min1], DestMax= Dest[max1], ArrDelayMax= ArrDelay[max1], MaxUniqueCarrier=UniqueCarrier[max1] )}, by = .(Origin, Month)]Or this can be made compact by nm1 <- c('Dest', 'ArrDelay', 'UniqueCarrier') res2 <- DT[, c(.SD[which.min(ArrDelay)], .SD[which.max(ArrDelay)]) , by = .(Origin, Month), .SDcols= nm1] setnames(res2, 3:ncol(res2), paste0(nm1, rep(c('Min', 'Max'),each=length(nm1)))) all.equal(res1, res2, check.attributes=FALSE) #[1] TRUEOr using dplyrlibrary(dplyr)grh <- group_by(hflights, Origin, Month)Min <- grh %>% slice(which.min(ArrDelay)) %>% select(Dest, ArrDelay, UniqueCarrier) %>% setNames(., c(names(.)[1:2], paste0(names(.)[3:5], 'Min')))Max <- grh %>% slice(which.max(ArrDelay)) %>% select(Dest, ArrDelay, UniqueCarrier) %>% setNames(., c(names(.)[1:2], paste0(names(.)[3:5], 'Max')))bind_cols(Min, Max[-(1:2)]) 这篇关于如何在.SD中对data.table执行进一步的分组和查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-20 10:55