问题描述
我正在尝试从数据框中获取两个随机绘制的子样本,提取子样本中一列的均值,然后计算均值之间的差.据我所知,下面的功能和do.call
中的replicate
的使用应该可以正常工作,但是我不断收到错误消息:
I am trying to take two randomly drawn subsamples from a data frame, extract the means of a column in the subsamples and calculate the difference between means. The below function and use of replicate
within do.call
should work as far as I can tell, but I keep getting an error message:
示例数据:
> dput(a)
structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L,
35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L,
36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), id = c(1L,
2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 16L, 17L, 18L, 19L, 20L, 21L, 21L, 22L, 23L, 24L,
25L)), .Names = c("index", "val", "id"), class = "data.frame", row.names = c(NA,
-30L))
代码:
# Function to select only one row for each unique id in the data frame,
# take 2 randomly drawn subsets of size 40 from this unique dataset,
# calculate means of both subsets and determine the difference between the two means
extractDiff <- function(P){
xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame
subA <- xA[sample(xA, 10, replace=TRUE), ] # takes a random sample of 40 rows
subB <- xA[sample(xA, 10, replace=TRUE), ] # takes a second random sample of 40 rows
meanA <- mean(subA$val)
meanB <- mean(subB$val)
diff <- abs(meanA-meanB)
outdf <- c(mA = meanA, mB= meanB, diffAB = diff)
return(outdf)
}
# To repeat the random selections and mean comparison X number of times...
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))
错误消息:
Error in xj[i] : invalid subscript type 'list'
我认为该错误与不以可以馈给rbind
的格式返回函数输出有关,但我尝试执行的操作似乎无效(即,我尝试将outdf对象转换为数据帧)和矩阵,仍然会得到错误信息(message).
I think that the error is something to do with not returning the function output in a format that can be fed to rbind
, but nothing I try seems to work (i.e. I have tried converting the outdf object to a data frame and matrix and still get the error moessage).
我仍在学习R,因此非常感谢您的帮助.谢谢!
I am still learning R so would be grateful for any help. Thanks!
推荐答案
如果将sample
list/data.frame作为第一个参数传递,它将返回list/data.frame.您不能使用data.frame来设置data.frame.
If you pass sample
a list/data.frame as the first argument it will return a list/data.frame. You can't use a data.frame for subsetting a data.frame.
library(plyr)
extractDiff <- function(P){
xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame
subA <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a random sample of 40 rows
subB <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a second random sample of 40 rows
meanA <- mean(subA$val)
meanB <- mean(subB$val)
diff <- abs(meanA-meanB)
outdf <- c(mA = meanA, mB= meanB, diffAB = diff)
return(outdf)
}
set.seed(42)
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))
# mA mB diffAB
# [1,] 29.4 25.5 3.9
# [2,] 25.8 23.0 2.8
# [3,] 25.3 29.5 4.2
# [4,] 29.0 31.2 2.2
# [5,] 26.5 25.6 0.9
# [6,] 26.8 27.2 0.4
# [7,] 28.7 27.3 1.4
# [8,] 22.7 28.7 6.0
# [9,] 30.6 23.2 7.4
# [10,] 25.1 25.2 0.1
这篇关于如何使用替换引导功能并返回输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!