有条件的数字序列 | 有条件的数

本文介绍了有条件的数字序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大的data.frame，我想为其生成一个新列（称为Seq），该列具有一个顺序值，每次在不同的列中有更改时，该值都会重新开始。这是data.frame（带有省略的列）和名为Seq的新列的示例。如您所见，有一个后续计数，但是每当有一个新的IDPath时，后续计数都会重新启动。
序贯长度可以有不同的长度，有些长度为1，而另一些长度为300。

I have a big data.frame that I want to generate a new column (called Seq) to, which has a sequential values that restarts every time there is a change in a different column. Here is an example of the data.frame (with omitted columns) and the new column called Seq. As you can see there is a sequentiel count, but everytime there is a new IDPath, the sequentiel count restarts.The sequentiel length can have different lengths, some are 1 long, while others are 300.

IDPath    LogTime               Seq
AADS      19-06-2015 01:57      1
AADS      19-06-2015 01:55      2
AADS      19-06-2015 01:54      3
AADS      19-06-2015 01:53      4
DHSD      19-06-2015 12:57      1
DHSD      19-06-2015 10:58      2
DHSD      19-06-2015 09:08      3
DHSD      19-06-2015 08:41      4

推荐答案

强制性Hadleyverse答案（在Hadleyvese答案之后还包括基数R答案）：

Obligatory Hadleyverse answer (base R answer also included after Hadleyvese answer):

library(dplyr)

dat <- read.table(text="IDPath    LogTime
AADS      '19-06-2015 01:57'
AADS      '19-06-2015 01:55'
AADS      '19-06-2015 01:54'
AADS      '19-06-2015 01:53'
DHSD      '19-06-2015 12:57'
DHSD      '19-06-2015 10:58'
DHSD      '19-06-2015 09:08'
DHSD      '19-06-2015 08:41'      ", header=TRUE, stringsAsFactors=FALSE, quote="'")

mutate(group_by(dat, IDPath), Seq=1:n())

OR（通过David Arenburg）

OR (via David Arenburg)

mutate(group_by(dat, IDPath), Seq=row_number())

或者如果您正在使用管道：

Or if you're into piping:

dat %>%
  group_by(IDPath) %>%
  mutate(Seq=1:n())

OR（via大卫·阿伦堡（David Arenburg）

OR (via David Arenburg)

dat %>%
  group_by(IDPath) %>%
  mutate(Seq=row_number())

强制基数R答案：

unsplit(lapply(split(dat, dat$IDPath), transform, Seq=1:length(IDPath)), dat$IDPath)

或更惯用（再次通过David）

OR more idiomatically (via David again)

with(dat, ave(IDPath, IDPath, FUN = seq_along))

如果确实是一个巨大的数据帧，那么您可能要开始 tbl_dt（dat）用于 dplyr 解决方案，但如果您已经在使用 data.table 。

If it really is a HUGE data frame then you may want to start with tbl_dt(dat) for the dplyr solutions, but CathG's or Jaap's versions will be faster if you're already using data.table.

这篇关于有条件的数字序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！