我有一张看起来像这样的表:
df <- read.table(text =
" Day location gender hashtags
'Feb 19 2016' 'UK' 'M' '#a'
'Feb 19 2016' 'UK' 'M' '#b'
'Feb 19 2016' 'SP' 'F' '#a'
'Feb 19 2016' 'SP' 'F' '#b'
'Feb 19 2016' 'SP' 'M' '#a'
'Feb 19 2016' 'SP' 'M' '#b'
'Feb 20 2016' 'UK' 'F' '#a'",
header = TRUE, stringsAsFactors = FALSE)
我想按天/主题标签/位置和性别计算频率,结果表如下所示:
Day hashtags Daily_Freq men women Freq_UK Freq_SP
Feb 19 2016 #a 3 2 1 1 2
Feb 19 2016 #b 3 2 1 1 1
Feb 20 2016 #a 1 0 1 1 0
其中 Daily_freq=men+women=Freq_UK+Freq_SP
我怎样才能做到这一点?
最佳答案
使用 dplyr
:
library(dplyr)
df %>%
group_by(Day, hashtags) %>%
summarise(Daily_Freq = n(),
men = sum(gender == 'M'),
women = sum(gender == 'F'),
Freq_UK = sum(location == 'UK'),
Freq_SP = sum(location == 'SP'))
给出:
在
data.table
中实现的逻辑相同:library(data.table)
setDT(df)[, .(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F'),
Freq_UK = sum(location == 'UK'),
Freq_SP = sum(location == 'SP'))
, by = .(Day, hashtags)]
关于r - 按某些列汇总单元格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48951869/