问题描述
作为我的数据示例,我在一个数据框中有包含三行数据的 GROUP 1 和包含两行数据的 GROUP 2:
As an example of my data, I have GROUP 1 with three rows of data, and GROUP 2 with two rows of data, in a data frame:
GROUP VARIABLE 1 VARIABLE 2 VARIABLE 3
1 2 6 5
1 4 NA 1
1 NA 3 8
2 1 NA 2
2 9 NA NA
我想从 GROUP 1 的每一列中抽取一个变量来创建一个代表 GROUP 1 的新行.我不想从 GROUP 1 中抽取一个完整的行,而是需要单独进行采样对于每一列.我想对 GROUP 2 做同样的事情.此外,抽样不应考虑/包括 NA,除非该组变量的所有行都有 NA(例如上面的 GROUP 2、VARIABLE 2).
I would like to sample a single variable, per column from GROUP 1, to make a new row representing GROUP 1. I do not want to sample one single and complete row from GROUP 1, but rather the sampling needs to occur individually for each column. I would like to do the same for GROUP 2. Also, the sampling should not consider/include NA's, unless all rows for that group's variable have NA's (such as GROUP 2, VARIABLE 2, above).
例如,在采样之后,我可以得到以下结果:
For example, after sampling, I could have as a result:
GROUP VARIABLE 1 VARIABLE 2 VARIABLE 3
1 4 6 1
2 9 NA 2
此处只有 GROUP 2, VARIABLE 2 会导致 NA
.我实际上有 39 个组、50,000 多个变量和大量的 NA
.我真诚地感谢代码来创建一个新的行数据框,每一行都有每组的采样结果.
Only GROUP 2, VARIABLE 2, can result in NA
here. I actually have 39 groups, 50,000+ variables, and a substantial number of NA
. I would sincerely appreciate the code to make a new data frame of rows, each row having the sampling results per group.
推荐答案
我们可以使用data.table
.将'data.frame'转换为'data.table'(setDT(df1)
),按'GROUP'分组,我们循环遍历列(lapply(.SD,
)>), if
all
元素都是 NA 我们返回 NA 否则我们得到非 NA 元素的 sample
.
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'GROUP', we loop through the columns (lapply(.SD,
), if
all
of the elements are NA we return NA or else we get the sample
of non-NA elements.
library(data.table)
setDT(df1)[,lapply(.SD, function(x)
if(all(is.na(x))) NA_integer_ else sample(na.omit(x),1)) , by = GROUP]
这篇关于在满足条件的同时,在 R 中的数据框的子集中对每列的单行进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!