问题描述
我有一个巨大的数据框 df1,它的简化版本包含 3 列,单词"、频率"和字母":
I have a huge dataframe df1, whose oversimplified version consists of 3 columns, "Words", "Frequency" and "Letters":
Words Frequency Letters
flower/tree 0.15 a(0.1)
tree 0.67 a(0.4)
planet 0.85 b(0.4)
tree/planet 0.42 c(0.5)
tree 0.89 a(0.6)
flower 0.21 b(0.4)
flower/planet 0.53 b
planet 0.07 a
使用R(dplyr,应用家庭函数等)我想计算字母"列的每个字母(a,b,c)与单词"中的每个单词相关联的次数列(花、树、行星),以迭代方式依赖于频率"列值的频率仓.有 4 个 bin:[0, 0.25], [0.25, 0.5], [0.5, 0.75], [0.75, 1].
Using R (dplyr, apply family functions, etc.) I would like to count the number of times every letter (a, b, c) of the "Letter" column is associated with every single word from the "Word" column (flower, tree, planet), in an iterative way dependent on the frequency bin of the "Frequency" column values. There are 4 bins: [0, 0.25], [0.25, 0.5], [0.5, 0.75], [0.75, 1].
我希望输出数据帧 df2 看起来像这样:
I expect an output dataframe df2 that looks something like this:
Bin Word Letters count_letters
0-0.25 flower a 1
0-0.25 flower b 1
0-0.25 tree a 1
0-0.25 planet a 1
0.25-0.5 tree c 1
0.25-0.5 planet c 1
0.5-0.75 flower b 1
0.5-0.75 tree a 1
0.5-0.75 planet b 1
0.75-1 tree a 1
0.75-1 planet b 1
推荐答案
可以使用cut
来bin Frequency
,substr
来清理Letters
和 tidyr::separate_rows
来取消嵌套 Word
.用 dplyr::count
聚合,你就设置了:
You can use cut
to bin Frequency
, substr
to clean Letters
, and tidyr::separate_rows
to unnest Word
. Aggregate with dplyr::count
, and you're set:
library(tidyverse)
df %>% separate_rows(Words) %>%
count(Words,
Letters = substr(Letters, 1, 1), # use regex if more than one letter
Frequency = cut(Frequency, breaks = seq(0, 1, .25)))
## Source: local data frame [11 x 4]
## Groups: Frequency, Words [?]
##
## Frequency Words Letters n
## <fctr> <chr> <chr> <int>
## 1 (0,0.25] flower a 1
## 2 (0,0.25] flower b 1
## 3 (0,0.25] planet a 1
## 4 (0,0.25] tree a 1
## 5 (0.25,0.5] planet c 1
## 6 (0.25,0.5] tree c 1
## 7 (0.5,0.75] flower b 1
## 8 (0.5,0.75] planet b 1
## 9 (0.5,0.75] tree a 1
## 10 (0.75,1] planet b 1
## 11 (0.75,1] tree a 1
这篇关于计算与其他列的双重类别相关联的列中的特定字符.基于频率仓迭代地做的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!