本文介绍了如何提取每个组的前n行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个data.table dt
。此data.table首先按 date
列(我的分组变量)排序,然后按 age
列排序:
I have a data.table dt
. This data.table is sorted first by column date
(my grouping variable), then by column age
:
library(data.table)
setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"
> dt
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-01 5 Charlie
4: 2000-01-02 6 Adam
5: 2000-01-02 7 Bob
6: 2000-01-02 8 Campbell
我的问题是:我想知道是否有可能提取每个唯一日期的前两行吗?或更笼统地说:
My question is: I am wondering if it's possible to extract the first 2 rows for each unique date? Or phrased more generally:
如何提取每个组中的前n行?
在此示例中, dt.f
的结果为:
In this example, the result in dt.f
would be:
> dt.f = ???????? # function of dt to extract the first 2 rows per unique date
> dt.f
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob
ps以下是创建上述data.table的代码:
p.s. Here is the code to create the aforementioned data.table:
install.packages("data.table")
library(data.table)
date <- c("2000-01-01","2000-01-01","2000-01-01",
"2000-01-02","2000-01-02","2000-01-02")
age <- c(3,4,5,6,7,8)
name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")
dt <- data.table(date, age, name)
setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"
推荐答案
是的,只需使用 .SD
并根据需要对其进行索引。
yep, just use .SD
and index it as needed.
DT[, .SD[1:2], by=date]
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob
根据@eddi的建议进行编辑。
@eddi的建议在以下位置出现:
Edited as per @eddi's suggestion.
@eddi's suggestion is spot on:
为了速度,请改用它:
DT[DT[, .I[1:2], by = date]$V1]
# using a slightly larger data set
> microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)
Unit: milliseconds
expr min lq median uq max neval
SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719 200
IStyle 1.675185 2.018773 2.168818 2.269292 11.31072 200
这篇关于如何提取每个组的前n行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!