本文介绍了根据R的严格范围绘制随机样本而不进行替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试绘制随机的行样本,而不从数据集中进行替换,以使样本中的列总和应严格在一定范围内.对于示例数据集 mtcars
,随机样本应确保 mpg
的总和严格在90-100之间.
I'm trying to draw a random sample of rows without replacement from a dataset such that the sum of a column in the sample should be strictly within a range. For the example dataset mtcars
, the random sample should be such that the sum of mpg
is strictly within 90-100.
可复制的示例:
data("mtcars")
random_sample <- function(dataset){
final_mpg = 0
while (final_mpg < 100) {
basic_dat <- dataset %>%
sample_n(1) %>%
ungroup()
total_mpg <- basic_dat %>%
summarise(mpg = sum(mpg)) %>%
pull(mpg)
final_mpg <- final_mpg + total_mpg
if (final_mpg > 90 & final_mpg < 100){
break()
}
final_dat <- rbind(get0("final_dat"), get0("basic_dat"))
}
return(final_dat)
}
chosen_sample <- random_sample(mtcars)
但是此函数输出的样本具有 sum(mpg)>100
.如何确保其生成的每个样本都严格在该范围内?非常感谢您的帮助.
But this function output samples with sum(mpg) > 100
. How do I ensure that every sample it generates is strictly within that range? Any help is much appreciated.
推荐答案
这是有效的.由于mpg的值,它不能超过90.
This is working. because of the values of mpg, it couldn't get more than 90.
ransmpl <- function(df) {
s1<- df[sample(rownames(df),1),]
s11 <- sum(s1$mpg)
while(s11<100){
rn2<- rownames(df[!(rownames(df) %in% rownames(s1)),])
nr<- df[sample(rn2,1),]
s11 <- sum(rbind(s1,nr)$mpg)
if(s11>100){
break()
}
s1<-rbind(s1,nr)
}
return(s1)
}
chosen_sample <- ransmpl(mtcars)
chosen_sample
输出
> chosen_sample
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
> sum(chosen_sample$mpg)
[1] 95.1
这篇关于根据R的严格范围绘制随机样本而不进行替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!