按某些列汇总单元格

按某些列汇总单元格

我有一张看起来像这样的表:

 df <- read.table(text =
      "  Day            location     gender    hashtags
       'Feb 19 2016'       'UK'      'M'       '#a'
       'Feb 19 2016'       'UK'      'M'       '#b'
       'Feb 19 2016'       'SP'      'F'       '#a'
       'Feb 19 2016'       'SP'      'F'       '#b'
       'Feb 19 2016'       'SP'      'M'       '#a'
       'Feb 19 2016'       'SP'      'M'       '#b'
       'Feb 20 2016'       'UK'      'F'       '#a'",
                 header = TRUE, stringsAsFactors = FALSE)

我想按天/主题标签/位置和性别计算频率,结果表如下所示:
           Day hashtags Daily_Freq men women Freq_UK Freq_SP
   Feb 19 2016       #a          3   2     1       1       2
   Feb 19 2016       #b          3   2     1       1       1
   Feb 20 2016       #a          1   0     1       1       0

其中 Daily_freq=men+women=Freq_UK+Freq_SP
我怎样才能做到这一点?

最佳答案

使用 dplyr :

library(dplyr)
df %>%
  group_by(Day, hashtags) %>%
  summarise(Daily_Freq = n(),
            men = sum(gender == 'M'),
            women = sum(gender == 'F'),
            Freq_UK = sum(location == 'UK'),
            Freq_SP = sum(location == 'SP'))

给出:



data.table 中实现的逻辑相同:
library(data.table)
setDT(df)[, .(Daily_Freq = .N,
              men = sum(gender == 'M'),
              women = sum(gender == 'F'),
              Freq_UK = sum(location == 'UK'),
              Freq_SP = sum(location == 'SP'))
          , by = .(Day, hashtags)]

关于r - 按某些列汇总单元格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48951869/

10-13 04:54