问题描述
假设您有一个包含两级因子的数据框,如下所示:
Let's say you have a data frame with two levels of factors that looks like this:
Factor1 Factor2 Value
A 1 0.75
A 1 0.34
A 2 1.21
A 2 0.75
A 2 0.53
B 1 0.42
B 2 0.21
B 2 0.18
B 2 1.42
等
我如何subset
这个数据框(df",如果你愿意的话)基于 Factor1 和 Factor2 的组合(Fact1*Fact2)有超过 2 个观察值的条件?您可以使用 subset
中的 length
参数来执行此操作吗?
How do I subset
this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length
argument in subset
to do this?
推荐答案
假设你的 data.frame
叫做 mydf
,你可以使用 ave
创建一个逻辑向量来帮助子集:
Assuming your data.frame
is called mydf
, you can use ave
to create a logical vector to help subset:
mydf[with(mydf, as.logical(ave(Factor1, Factor1, Factor2,
FUN = function(x) length(x) > 2))), ]
# Factor1 Factor2 Value
# 3 A 2 1.21
# 4 A 2 0.75
# 5 A 2 0.53
# 7 B 2 0.21
# 8 B 2 0.18
# 9 B 2 1.42
这是 ave
计算您的组合.请注意,ave
返回一个与data.frame
中的行数相同长度的对象(这便于子集化).
Here's ave
counting up your combinations. Notice that ave
returns an object the same length as the number of rows in your data.frame
(this makes it convenient for subsetting).
> with(mydf, ave(Factor1, Factor1, Factor2, FUN = length))
[1] "2" "2" "3" "3" "3" "1" "3" "3" "3"
下一步是将该长度与您的阈值进行比较.为此,我们的 FUN
参数需要一个匿名函数.
The next step is to compare that length to your threshold. For that we need an anonymous function for our FUN
argument.
> with(mydf, ave(Factor1, Factor1, Factor2, FUN = function(x) length(x) > 2))
[1] "FALSE" "FALSE" "TRUE" "TRUE" "TRUE" "FALSE" "TRUE" "TRUE" "TRUE"
差不多了……但是由于第一项是字符向量,我们的输出也是字符向量.我们想要它 as.logical
这样我们就可以直接使用它进行子集化.
Almost there... but since the first item was a character vector, our output is also a character vector. We want it as.logical
so we can directly use it for subsetting.
ave
不适用于 factor
类的对象,在这种情况下,您需要执行以下操作:
ave
doesn't work on objects of class factor
, in which case you'll need to do something like:
mydf[with(mydf, as.logical(ave(as.character(Factor1), Factor1, Factor2,
FUN = function(x) length(x) > 2))),]
这篇关于您如何根据最小样本大小对 R 中的数据框进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!