本文介绍了添加序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个(简化)看起来像这样的数据框:
I have a dataframe that (simplified) looks something like this:
Index Studio Event
1
2 MovieStart
3
4
5
6
7 MovieEnd
8
9
10 MovieStart
11
12
13
14
15 MovieEnd
我想创建第三列,该列创建一个从 0 到 50 的序列,该序列从 StudioEvent = MovieStart 开始并在 StudioEvent = MovieEnd 时结束.所以是这样的:
I would like to create a third column that creates a sequence from 0 and increment of 50 that begins when the StudioEvent = MovieStart and ends when StudioEvent = MovieEnd. So something like this:
Index Studio Event Sequence
1
2 MovieStart 0
3 50
4 100
5 150
6 200
7 MovieEnd 250
8
9
10 MovieStart 0
11 50
12 100
13 150
14 200
15 MovieEnd 250
知道我该怎么做吗?先感谢您.
Any idea how I can do it? Thank you in advance.
推荐答案
一个使用 data.table
的选项:
#identify indices between MovieStart and MovieEnd
DT[, cs := cumsum(StudioEvent=="MovieStart") - cumsum(StudioEvent=="MovieEnd")]
#perform rolling join to find the start of movies for MovieEnd and indices between MovieStart and MovieEnd
DT[StudioEvent=="MovieEnd" | cs == 1L,
ms := DT[StudioEvent=="MovieStart"][.SD, on=.(Index), roll=Inf, x.Index]
]
#generate sequence
DT[, Sequence := (Index - ms) * 50]
输出:
Index StudioEvent cs ms Sequence
1: 1 0 NA NA
2: 2 MovieStart 1 2 0
3: 3 1 2 50
4: 4 1 2 100
5: 5 1 2 150
6: 6 1 2 200
7: 7 MovieEnd 0 2 250
8: 8 0 NA NA
9: 9 0 NA NA
10: 10 MovieStart 1 10 0
11: 11 1 10 50
12: 12 1 10 100
13: 13 1 10 150
14: 14 1 10 200
15: 15 MovieEnd 0 10 250
数据:
library(data.table)
DT <- fread("Index,StudioEvent
1,
2,MovieStart
3,
4,
5,
6,
7,MovieEnd
8,
9,
10,MovieStart
11,
12,
13,
14,
15,MovieEnd")
这篇关于添加序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!