本文介绍了在R中编写一个函数,以按频率对因子水平进行分组,然后保留2个最大的类别,并将其余的类别合并到“其他"类别中.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在R中编写一个函数,该函数将单个因子变量和参数n作为输入,计算因子变量中每个类别的病例数,并仅保留具有最多病例数的n个类别,将所有其他类别归入其他"类别.此函数必须应用于多个变量,每个变量保留2个最大的类别,并将每个变量中的所有其他类别合并为一个其他"类别.
I would like to write a function in R that takes a single factor variable and a parameter n as inputs, computes the number of cases per category in the factor variable, and only keeps those n categories with the most number of cases and pools all other categories into a category "other." This function must be applied to multiple variables, keeping the 2 largest categories for each variable and pooling all other categories in each variable into a category "other."
示例:
var1 <- c("square", "square", "square", "circle", "square", "square", "circle",
"square", "circle", "circle", "circle", "circle", "square", "circle", "triangle", "circle", "circle", "rectangle")
var2 <- c("orange", "orange", "orange", "orange", "blue", "orange", "blue",
"blue", "orange", "blue", "blue", "blue", "orange", "orange", "orange", "orange", "green", "purple")
df <- data.frame(var1, var2)
非常感谢您!
推荐答案
forcats::fct_lump_n()
为此存在:
library(forcats)
library(dplyr)
df %>%
mutate_all(fct_lump_n, 2)
var1 var2
1 square orange
2 square orange
3 square orange
4 circle orange
5 square blue
6 square orange
7 circle blue
8 square blue
9 circle orange
10 circle blue
11 circle blue
12 circle blue
13 square orange
14 circle orange
15 Other orange
16 circle orange
17 circle Other
18 Other Other
这篇关于在R中编写一个函数,以按频率对因子水平进行分组,然后保留2个最大的类别,并将其余的类别合并到“其他"类别中.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!