问题描述
我有一个看起来像这样的数据集,其中一列可以有四个不同的值:
I have a dataset that looks something like this, with a column that can have four different values:
dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))
在 R 中,我想创建第二列,按顺序记录包含特定值的累积行数.因此输出列将如下所示:
In R, I'd like to create a second column that tallies, in sequence, the cumulative number of rows containing a particular value. Thus the output column would look like this:
out
1
1
1
2
1
2
2
3
2
3
3
4
推荐答案
试试这个:
dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))
with(dataset, ave(as.character(out), out, FUN = seq_along))
# [1] "1" "1" "1" "2" "1" "2" "2" "3" "2" "3" "3" "4"
当然,您可以使用类似 out$asNumbers <- with(dataset, ave(as.character(out), out, FUN = seq_along))
Of course, you can assign the output to a column in your data.frame
using something like out$asNumbers <- with(dataset, ave(as.character(out), out, FUN = seq_along))
dplyr"方法也很不错.逻辑与data.table"方法非常相似.一个优点是您不需要用 as.numeric
包装输出,而上面提到的 ave
方法需要使用 as.numeric
.
The "dplyr" approach is also quite nice. The logic is very similar to the "data.table" approach. An advantage is that you don't need to wrap the output with as.numeric
which would be required with the ave
approach mentioned above.
dataset %>% group_by(out) %>% mutate(count = sequence(n()))
# Source: local data frame [12 x 2]
# Groups: out
#
# out count
# 1 a 1
# 2 b 1
# 3 c 1
# 4 a 2
# 5 d 1
# 6 b 2
# 7 c 2
# 8 a 3
# 9 d 2
# 10 b 3
# 11 c 3
# 12 a 4
第三个选项是使用我的splitstackshape"包中的 getanID
.对于此特定示例,您只需要指定 data.frame
名称(因为它是单个列),但是,通常,您会更具体并提及当前用作的列ids",该函数将检查它们是否唯一或是否需要累积序列才能使它们唯一.
A third option is to use getanID
from my "splitstackshape" package. For this particular example, you just need to specify the data.frame
name (since it's a single column), however, generally, you would be more specific and mention the column(s) that presently serve as "ids", and the function would check whether they are unique or whether a cumulative sequence is required to make them unique.
library(splitstackshape)
# getanID(dataset, "out") ## Example of being specific about column to use
getanID(dataset)
# out .id
# 1: a 1
# 2: b 1
# 3: c 1
# 4: a 2
# 5: d 1
# 6: b 2
# 7: c 2
# 8: a 3
# 9: d 2
# 10: b 3
# 11: c 3
# 12: a 4
这篇关于值出现的累积序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!