我有一个数据表:
> (mydt <- data.table(id=c(1,1,1,1,2,2),
time=1:6,
v1=letters[1:6],
v2=LETTERS[1:6],
key=c("id","time")))
id time v1 v2
1: 1 1 a A
2: 1 2 b B
3: 1 3 c C
4: 1 4 d D
5: 2 5 e E
6: 2 6 f F
我想将其“汇总”(这里是正确的术语吗?)到“变更”表中:
对象
1
更改了3次(从时间戳1更改为2、2更改为3,时间戳为3更改为4)对象
2
更改了一次(时间5到6);我对初始
v1
和最终v2
感兴趣。因此,结果应为:
> (res <- data.table(beg.time=c(1,2,3,5),
end.time=c(2,3,4,6),
v1=c('a','b','c','e'),
v2=c('B','C','D','F'),
key=c("beg.time","end.time")))
beg.time end.time v1 v2
1: 1 2 a B
2: 2 3 b C
3: 3 4 c D
4: 5 6 e F
最佳答案
感谢您提供的可复制示例!这是一个镜头。
首先,请注意,您可以使用以下首尾习惯用法将向量的输入项彼此隔开一定的距离:
x <- letters[1:5]
cbind(head(x, -1), tail(x, -1))
# [,1] [,2]
# [1,] "a" "b"
# [2,] "b" "c"
# [3,] "c" "d"
# [4,] "d" "e"
cbind(head(x, -2), tail(x, -2))
# [,1] [,2]
# [1,] "a" "c"
# [2,] "b" "d"
# [3,] "c" "e"
然后,我们可以使用
by
的data.table
功能按组进行此操作。mydt[,{
## if there's just one row in the group of ID's, return nothing
if (.N == 1) return(NULL)
else {
list(
## head and tail take the first and last parts of a vector
## this will place an element next to its subsequent element
beg.time = head(time, -1),
end.time = tail(time, -1),
v1 = head(v1, -1),
v2 = tail(v2, -1)
## group by ID
)}}, by = id]
# id beg.time end.time v1 v2
# 1: 1 1 2 a B
# 2: 1 2 3 b C
# 3: 1 3 4 c D
# 4: 2 5 6 e F
关于r - 汇总数据表,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/18853661/