问题描述
我希望(1)按一个变量(State
)对数据进行分组,(2)在每个组中找到另一个变量(Employees
)的最小值行,以及 (3) 提取整行.
I wish to (1) group data by one variable (State
), (2) within each group find the row of minimum value of another variable (Employees
), and (3) extract the entire row.
(1) 和 (2) 是简单的单行,我觉得 (3) 也应该是,但我无法理解.
(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.
这是一个示例数据集:
> data
State Company Employees
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI G 88
8 RI H 42
data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",
"Employees"), class = "data.frame", row.names = c(NA, -8L))
按组计算min
很容易,使用aggregate
:
Calculate min
by group is easy, using aggregate
:
> aggregate(Employees ~ State, data, function(x) min(x))
State Employees
1 AK 24
2 RI 19
...或data.table
:
> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
State Employees
1: AK 24
2: RI 19
但是如何提取与这些 min
值对应的整行,即结果中还包括 Company
?
But how do I extract the entire row corresponding to these min
values, i.e. also including Company
in the result?
推荐答案
稍微优雅一点:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
比使用 .SD
略逊一筹,但要快一些(对于多组数据):
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
另外,如果您的数据集有多个相同的最小值并且您'我想对所有这些进行子集化.
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
另请参见 对应于 max 的子集行使用 data.table 分组值.
这篇关于按组提取对应于变量最小值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!