您如何根据最小样本大小对 R 中的数据框进行子集化

本文介绍了您如何根据最小样本大小对 R 中的数据框进行子集化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设您有一个包含两级因子的数据框，如下所示:

Let's say you have a data frame with two levels of factors that looks like this:

Factor1    Factor2    Value
A          1          0.75
A          1          0.34
A          2          1.21   
A          2          0.75 
A          2          0.53
B          1          0.42
B          2          0.21  
B          2          0.18
B          2          1.42

等

我如何subset这个数据框(df"，如果你愿意的话)基于 Factor1 和 Factor2 的组合(Fact1*Fact2)有超过 2 个观察值的条件?您可以使用 subset 中的 length 参数来执行此操作吗?

How do I subset this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length argument in subset to do this?

推荐答案

假设你的 data.frame 叫做 mydf，你可以使用 ave 创建一个逻辑向量来帮助子集:

Assuming your data.frame is called mydf, you can use ave to create a logical vector to help subset:

mydf[with(mydf, as.logical(ave(Factor1, Factor1, Factor2, 
                           FUN = function(x) length(x) > 2))), ]
#   Factor1 Factor2 Value
# 3       A       2  1.21
# 4       A       2  0.75
# 5       A       2  0.53
# 7       B       2  0.21
# 8       B       2  0.18
# 9       B       2  1.42

这是 ave 计算您的组合.请注意，ave 返回一个与data.frame 中的行数相同长度的对象(这便于子集化).

Here's ave counting up your combinations. Notice that ave returns an object the same length as the number of rows in your data.frame (this makes it convenient for subsetting).

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = length))
[1] "2" "2" "3" "3" "3" "1" "3" "3" "3"

下一步是将该长度与您的阈值进行比较.为此，我们的 FUN 参数需要一个匿名函数.

The next step is to compare that length to your threshold. For that we need an anonymous function for our FUN argument.

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = function(x) length(x) > 2))
[1] "FALSE" "FALSE" "TRUE"  "TRUE"  "TRUE"  "FALSE" "TRUE"  "TRUE"  "TRUE"

差不多了……但是由于第一项是字符向量，我们的输出也是字符向量.我们想要它 as.logical 这样我们就可以直接使用它进行子集化.

Almost there... but since the first item was a character vector, our output is also a character vector. We want it as.logical so we can directly use it for subsetting.

ave 不适用于 factor 类的对象，在这种情况下，您需要执行以下操作:

ave doesn't work on objects of class factor, in which case you'll need to do something like:

mydf[with(mydf, as.logical(ave(as.character(Factor1), Factor1, Factor2, 
                               FUN = function(x) length(x) > 2))),]

这篇关于您如何根据最小样本大小对 R 中的数据框进行子集化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

AVE