

我有一个巨大的数据框 df1,它的简化版本包含 3 列,单词"、频率"和字母":

I have a huge dataframe df1, whose oversimplified version consists of 3 columns, "Words", "Frequency" and "Letters":

Words           Frequency   Letters
flower/tree     0.15        a(0.1)
tree            0.67        a(0.4)
planet          0.85        b(0.4)
tree/planet     0.42        c(0.5)
tree            0.89        a(0.6)
flower          0.21        b(0.4)
flower/planet   0.53        b
planet          0.07        a

使用R(dplyr,应用家庭函数等)我想计算字母"列的每个字母(a,b,c)与单词"中的每个单词相关联的次数列(花、树、行星),以迭代方式依赖于频率"列值的频率仓.有 4 个 bin:[0, 0.25], [0.25, 0.5], [0.5, 0.75], [0.75, 1].

Using R (dplyr, apply family functions, etc.) I would like to count the number of times every letter (a, b, c) of the "Letter" column is associated with every single word from the "Word" column (flower, tree, planet), in an iterative way dependent on the frequency bin of the "Frequency" column values. There are 4 bins: [0, 0.25], [0.25, 0.5], [0.5, 0.75], [0.75, 1].

我希望输出数据帧 df2 看起来像这样:

I expect an output dataframe df2 that looks something like this:

Bin       Word    Letters    count_letters
0-0.25    flower  a          1
0-0.25    flower  b          1
0-0.25    tree    a          1
0-0.25    planet  a          1
0.25-0.5  tree    c          1
0.25-0.5  planet  c          1
0.5-0.75  flower  b          1
0.5-0.75  tree    a          1
0.5-0.75  planet  b          1
0.75-1    tree    a          1
0.75-1    planet  b          1


可以使用cut来bin Frequencysubstr来清理Letterstidyr::separate_rows 来取消嵌套 Word.用 dplyr::count 聚合,你就设置了:

You can use cut to bin Frequency, substr to clean Letters, and tidyr::separate_rows to unnest Word. Aggregate with dplyr::count, and you're set:


df %>% separate_rows(Words) %>% 
          Letters = substr(Letters, 1, 1),    # use regex if more than one letter
          Frequency = cut(Frequency, breaks = seq(0, 1, .25)))

## Source: local data frame [11 x 4]
## Groups: Frequency, Words [?]
##     Frequency  Words Letters     n
##        <fctr>  <chr>   <chr> <int>
## 1    (0,0.25] flower       a     1
## 2    (0,0.25] flower       b     1
## 3    (0,0.25] planet       a     1
## 4    (0,0.25]   tree       a     1
## 5  (0.25,0.5] planet       c     1
## 6  (0.25,0.5]   tree       c     1
## 7  (0.5,0.75] flower       b     1
## 8  (0.5,0.75] planet       b     1
## 9  (0.5,0.75]   tree       a     1
## 10   (0.75,1] planet       b     1
## 11   (0.75,1]   tree       a     1


10-30 08:51