问题描述
我想创建一个新变量,该变量按组统计上一个项目的数量。这就是我的意思,以 esoph
数据集为例。
I would like to create a new variable which counts the number of previous items in a by group. Here is what I mean, taking the esoph
dataset as an example.
首先,我按组 esoph $ agegp,esoph $ alcgp
和附加值列 -esoph $ ncontrols
。
first I sort the dataset by my by group esoph$agegp, esoph$alcgp
and an additional value column -esoph$ncontrols
.
这给了我以下数据集
x<-esoph[order(esoph$agegp, esoph$alcgp, -esoph$ncontrols ), ]
x
agegp alcgp tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day 0 40
2 25-34 0-39g/day 10-19 0 10
3 25-34 0-39g/day 20-29 0 6
4 25-34 0-39g/day 30+ 0 5
5 25-34 40-79 0-9g/day 0 27
6 25-34 40-79 10-19 0 7
8 25-34 40-79 30+ 0 7
7 25-34 40-79 20-29 0 4
9 25-34 80-119 0-9g/day 0 2
11 25-34 80-119 30+ 0 2
...
现在,我会ke创建一个具有某种索引的新变量,每行增加一个。每当下一个按组分组开始时,索引就会返回到1。
Now, I would like to create a new variable with some sort of index, increasing by one on every row. Whenever the next by group starts, the index goes back to 1.
结果表如下(带有附加索引列):
The resulting table would be the following (with the additional index column):
agegp alcgp tobgp ncases ncontrols index
1 25-34 0-39g/day 0-9g/day 0 40 1
2 25-34 0-39g/day 10-19 0 10 2
3 25-34 0-39g/day 20-29 0 6 3
4 25-34 0-39g/day 30+ 0 5 4
5 25-34 40-79 0-9g/day 0 27 1
6 25-34 40-79 10-19 0 7 2
8 25-34 40-79 30+ 0 7 3
7 25-34 40-79 20-29 0 4 4
9 25-34 80-119 0-9g/day 0 2 1
11 25-34 80-119 30+ 0 2 2
...
如何计算此列?
谢谢!
推荐答案
可以使用任何专用软件包su ch为 dplyr
,其中具有 row_number()
。我们需要对变量('alcgp')进行分组,并使用 mutate
创建一个新列。
This can be approached using either specialized packages such as dplyr
which has row_number()
. We need to group by the variable ('alcgp') and create a new column using mutate
.
library(dplyr)
df1 %>%
group_by( alcgp) %>%
mutate(indx= row_number())
或使用 base R中的
ave
code>。我们按 alcgp分组,在 FUN
中,我们可以指定 seq_along
。我使用了 seq_along(alcgp)
,因为如果变量是 factor
类,它可能不起作用。
Or using ave
from base R
. We group by 'alcgp' and in the FUN
we can specify seq_along
. I used seq_along(alcgp)
as it may not work if the variable is factor
class.
df1$indx <- with(df1, ave(seq_along(alcgp), alcgp, FUN=seq_along))
splitstackshape
中的另一个便捷函数,即 getanID
Another convenient function in splitstackshape
i.e. getanID
library(splitstackshape)
getanID(df1, 'alcgp')
这篇关于在R中按组计算先前项目的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!