问题描述
我有以下数据集
id1<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
status<-c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df<-data.frame(id1,status)
在 df
中,我观察到的 40% status
是2".我正在寻找一个函数来从 df
中提取 10 个观察值的样本,同时保持上述比例.
In df
for 40% of my observations status
is '2'.I am looking for a function to extract a sample of 10 observations from df
while maintaining the above proportion.
我已经看到 从 R 中的数据帧分层随机抽样 但它不是在谈论比例.
I have already seen stratified random sampling from data frame in R but it is not talking about the proportions.
推荐答案
您可以尝试我的splitstackshape"包中的 stratified
功能:
You can try the stratified
function from my "splitstackshape" package:
library(splitstackshape)
stratified(df, "status", 10/nrow(df))
# id1 status
# 1: 5 1
# 2: 12 1
# 3: 2 1
# 4: 1 1
# 5: 6 1
# 6: 9 1
# 7: 16 2
# 8: 17 2
# 9: 18 2
# 10: 15 2
或者,使用来自dplyr"的sample_frac
:
Alternatively, using sample_frac
from "dplyr":
library(dplyr)
df %>%
group_by(status) %>%
sample_frac(10/nrow(df))
这两者都将采用与原始分组变量成比例的分层样本(因此使用 10/nrow(df)
,或等效地,0.5
).
Both of these would take a stratified sample proportional to the original grouping variable (hence the use of 10/nrow(df)
, or, equivalently, 0.5
).
这篇关于在控制比例的同时从 data.frame 采样[分层采样]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!