本文介绍了如何使用 dplyr 选择每组中具有最大值的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用 dplyr 在每组中选择一个最大值的行.

I would like to select a row with maximum value in each group with dplyr.

首先我生成一些随机数据来显示我的问题

Firstly I generate some random data to show my question

set.seed(1)
df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))
df$value <- runif(nrow(df))

在 plyr 中,我可以使用自定义函数来选择这一行.

In plyr, I could use a custom function to select this row.

library(plyr)
ddply(df, .(A, B), function(x) x[which.max(x$value),])

在 dplyr 中,我使用此代码来获取最大值,但不是具有最大值的行(在本例中为 C 列).

In dplyr, I am using this code to get the maximum value, but not the rows with maximum value (Column C in this case).

library(dplyr)
df %>% group_by(A, B) %>%
    summarise(max = max(value))

我怎样才能做到这一点?感谢您的任何建议.

How could I achieve this? Thanks for any suggestion.

sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_0.2  plyr_1.8.1

loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 parallel_3.1.0      Rcpp_0.11.1
[4] tools_3.1.0

推荐答案

试试这个:

result <- df %>%
             group_by(A, B) %>%
             filter(value == max(value)) %>%
             arrange(A,B,C)

似乎有效:

identical(
  as.data.frame(result),
  ddply(df, .(A, B), function(x) x[which.max(x$value),])
)
#[1] TRUE

正如评论中所指出的,根据 @RoyalITS 的回答,此处可能首选 slice/a> 如果您严格要求每组只有 1 行.如果有多个具有相同最大值的行,则此答案将返回多行.

As pointed out in the comments, slice may be preferred here as per @RoyalITS' answer below if you strictly only want 1 row per group. This answer will return multiple rows if there are multiple with an identical maximum value.

这篇关于如何使用 dplyr 选择每组中具有最大值的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!