问题描述
我想给数据框中的每个组编号。例如,我有以下数据框:
I want to give numbers to each group in a dataframe. For example, I have the following dataframe:
df = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd') )
#> df
#from dest
#1 a b
#2 a c
#3 b d
我想根据值对进行分组,并为每个组提供一个组号。这是预期的结果:
I want to group by
from
values and give a group number to each group. This is the expected result:
result = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd'), group_no = c(1,1,2) )
#> result
#from dest group_no
#1 a b 1
#2 a c 1
#3 b d 2
我可以使用以下for循环解决此问题:
I can solve this problem using a for loop as follows:
groups = df$from %>% unique
i = 0
df$group_no = NA
for ( g in groups ) {
i = i + 1
df[ df$from == g, ]$group_no = i
}
#> df
#from dest group_no
#1 a b 1
#2 a c 1
#3 b d 2
我想知道是否有可能在不使用for循环的情况下以更优雅,更实用的方式解决此问题?具体来说,我想知道是否可以使用
dplyr :: group_by
函数来做到这一点?
I wonder if it is possible to solve this problem in a more elegant and functional way without using for loops? Specifically, I wonder if this can be done using
dplyr::group_by
function?
推荐答案
使用
mutate
添加一列,该列只是 from
的数字形式作为一个因素:
Use
mutate
to add a column which is just a numeric form of from
as a factor:
df %>% mutate(group_no = as.integer(factor(from)))
# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2
...或者只是
mutate(df, group_no = as.integer(factor(from)))
注意
group_by
在这里不是必需的,除非您将其用于其他目的。如果要按新列分组以便以后使用,可以使用 group_by
代替 mutate
来添加列
Note
group_by
isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by
instead of mutate
to add the column.
这篇关于如何使用dplyr :: group_by为数据框的每个组赋予数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!