本文介绍了从数据框中删除稀有因子水平的优雅方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按因子对数据帧进行子集化.我只想保留高于某个频率的因子水平.

I want to subset a dataframe by factor. I only want to retain factor levels above a certain frequency.

df <- data.frame(factor = c(rep("a",5),rep("b",5),rep("c",2)), variable = rnorm(12))

此代码创建数据框:

   factor    variable
1       a -1.55902013
2       a  0.22355431
3       a -1.52195456
4       a -0.32842689
5       a  0.85650212
6       b  0.00962240
7       b -0.06621508
8       b -1.41347823
9       b  0.08969098
10      b  1.31565582
11      c -1.26141417
12      c -0.33364069

而且我想删除重复少于 5 次的因子水平.我开发了一个 for 循环并且它正在工作:

And I want to drop factor levels which repeated less than 5 times. I developed a for-loop and it is working:

for (i in 1:length(levels(df$factor))){
  if(table(df$factor)[i] < 5){
    df.new <- df[df$factor != names(table(df$factor))[i],]
  }
}

但是否存在更快更漂亮的解决方案?

But do quicker and prettier solutions exists?

推荐答案

怎么样

df.new <- df[!(as.numeric(df$factor) %in% which(table(df$factor)<5)),]

这篇关于从数据框中删除稀有因子水平的优雅方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:19