本文介绍了添加序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(简化)看起来像这样的数据框:

I have a dataframe that (simplified) looks something like this:

Index     Studio Event
1
2          MovieStart
3
4
5
6
7           MovieEnd
8
9
10          MovieStart
11
12
13
14
15          MovieEnd

我想创建第三列,该列创建一个从 0 到 50 的序列,该序列从 StudioEvent = MovieStart 开始并在 StudioEvent = MovieEnd 时结束.所以是这样的:

I would like to create a third column that creates a sequence from 0 and increment of 50 that begins when the StudioEvent = MovieStart and ends when StudioEvent = MovieEnd. So something like this:

Index     Studio Event    Sequence
1
2          MovieStart      0
3                          50
4                          100
5                          150
6                          200
7           MovieEnd       250
8
9
10          MovieStart     0
11                         50
12                         100
13                         150
14                         200
15          MovieEnd       250

知道我该怎么做吗?先感谢您.

Any idea how I can do it? Thank you in advance.

推荐答案

一个使用 data.table 的选项:

#identify indices between MovieStart and MovieEnd
DT[, cs := cumsum(StudioEvent=="MovieStart") - cumsum(StudioEvent=="MovieEnd")]

#perform rolling join to find the start of movies for MovieEnd and indices between MovieStart and MovieEnd
DT[StudioEvent=="MovieEnd" | cs == 1L,
    ms := DT[StudioEvent=="MovieStart"][.SD, on=.(Index), roll=Inf, x.Index]
]

#generate sequence
DT[, Sequence := (Index - ms) * 50]

输出:

    Index StudioEvent cs ms Sequence
 1:     1              0 NA       NA
 2:     2  MovieStart  1  2        0
 3:     3              1  2       50
 4:     4              1  2      100
 5:     5              1  2      150
 6:     6              1  2      200
 7:     7    MovieEnd  0  2      250
 8:     8              0 NA       NA
 9:     9              0 NA       NA
10:    10  MovieStart  1 10        0
11:    11              1 10       50
12:    12              1 10      100
13:    13              1 10      150
14:    14              1 10      200
15:    15    MovieEnd  0 10      250

数据:

library(data.table)
DT <- fread("Index,StudioEvent
1,
2,MovieStart
3,
4,
5,
6,
7,MovieEnd
8,
9,
10,MovieStart
11,
12,
13,
14,
15,MovieEnd")

这篇关于添加序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:19