本文介绍了对一组列应用函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 如何使用应用或相关函数创建一个新的数据帧,其中包含非常大的数据帧中每对列的行平均值的结果? 我有一个对大量样本进行重复测量的 n 的仪器,其中每个单次测量是矢量(所有测量都是相同的长度矢量)。我想计算每个样本的所有重复测量的平均值(和其他统计数据)。这意味着我需要将 n 连续列组合在一起,并进行逐行计算。 对于一个简单的例子,对两个样本进行三次重复测量,我最终如何得到一个数据帧,它有两列(每个样本一个),一个是 dat $ a , dat $ b 和 dat $中的每一行重复的平均值c ,一个是 dat $ d , dat $ e 和 dat $ f 。 以下是一些示例数据 (a = rnorm(16),b = rnorm(16),c = rnorm(16),d = rnorm(16),e = rnorm(16),f = rnorm(16)) abcdef 1 -0.9089594 -0.8144765 0.872691548 0.4051094 -0.09705234 -1.5100709 2 0.7993102 0.3243804 0.394560355 0.6646588 0.91033497 2.2504104 3 0.2963102 -0.2911078 -0.243723116 1.0661698 -0.89747522 -0.8455833 4 -0.4311512 -0.5997466 -0.545381175 0.3495578​​ 0.38359390 0.4999425 5 -0.4955802 1.8949285 -0.2665 80411 1.2773987 -0.79373386 -1.8664651 6 1.0957793 -0.3326867 -1.116623982 -0.8584253 0.83704172 1.8368212 7 -0.2529444 0.5792413 -0.001950741 0.2661068 1.17515099 0.4875377 8 1.2560402 0.1354533 1.440160168 -2.1295397 2.05025701 1.0377283 9 0.8123061 0.4453768 1.598246016 0.7146553 -1.09476532 0.0600665 10 0.1084029 -0.4934862 -0.584671816 -0.8096653 1.54466019 -1.8117459 11 -0.8152812 0.9494620 0.100909570 1.5944528 1.56724269 0.6839954 12 0.3130357 2.6245864 1.750448404 -0.7494403 1.06055267 1.0358267 13 1.1976817 -1.2110708 0.719397607 -0.2690107 0.83364274 -0.6895936 14 -2.1860098 -0.8488031 -0.302743475 -0.7348443 0.34302096 -0.8024803 15 0.2361756 0.6773727 1.279737692 0.8742478 -0.03064782 -0.4874172 16 -1.5634527 -0.8276335 0.753090683 2.0394865 0.79006103 0.5704210 我是这样的东西 X1 X2 1 -0.28358147 -0.40067128 2 0.50608365 1.27513471 3 -0.07950691 -0.22562957 4 -0.52542633 0.41103139 5 0.37758930 -0.46093340 6 -0.11784382 0.60514586 7 0.10811540 0.64293184 8 0.94388455 0.31948189 9 0.95197629 -0.10668118 10 -0.32325169 -0.35891702 11 0.07836345 1.28189698 12 1.56269017 0.44897971 13 0.23533617 -0.04165384 14 -1.11251880 -0.39810121 15 0.73109533 0.11872758 16 -0.54599850 1.13332286 我做了这个,但显然对我的更大的数据框架没有好处... data.frame(cbind( apply(cbind(dat $ a,dat $ b,dat $ c),1,mean), apply(cbind(dat $ d,dat $ e,dat $ f),1,mean))) 我试过 apply 并循环,不能很好地得到它。我的实际数据有几百列。 解决方案这可能是更普遍的你的情况,你通过一个索引列表。如果速度是一个问题(大数据框),我会选择 lapply 与 do.call 而不是 sapply : x< - 列表(1:3,4:6 ) do.call(cbind,lapply(x,function(i)rowMeans(dat [,i]))) 如果你也有col名称,你可以工作: x< - list(c(' a,b,c,c('d','e','f')) do.call(cbind,lapply(x,function(i)rowMeans i]))) 编辑 刚刚想到也许你想自动化这样做每三列。我知道有一个更好的方法,但是这是一个100列数据集: dat n ind do.call(cbind, lapply(ind,function(i)rowMeans(dat [,i]))) 编辑2 仍然不满意索引。我认为传递索引有更好/更快的方式。这里是第二个但不令人满意的方法: n ind nonna< - sapply(ind,function )所有(!is.na(x))) ind do.call(cbind,lapply(ind,function(i)rowMeans [,i]))) How can I use apply or a related function to create a new data frame that contains the results of the row averages of each pair of columns in a very large data frame?I have an instrument that outputs n replicate measurements on a large number of samples, where each single measurement is a vector (all measurements are the same length vectors). I'd like to calculate the average (and other stats) on all replicate measurements of each sample. This means I need to group n consecutive columns together and do row-wise calculations.For a simple example, with three replicate measurements on two samples, how can I end up with a data frame that has two columns (one per sample), one that is the average each row of the replicates in dat$a, dat$b and dat$c and one that is the average of each row for dat$d, dat$e and dat$f.Here's some example datadat <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16)) a b c d e f1 -0.9089594 -0.8144765 0.872691548 0.4051094 -0.09705234 -1.51007092 0.7993102 0.3243804 0.394560355 0.6646588 0.91033497 2.25041043 0.2963102 -0.2911078 -0.243723116 1.0661698 -0.89747522 -0.84558334 -0.4311512 -0.5997466 -0.545381175 0.3495578 0.38359390 0.49994255 -0.4955802 1.8949285 -0.266580411 1.2773987 -0.79373386 -1.86646516 1.0957793 -0.3326867 -1.116623982 -0.8584253 0.83704172 1.83682127 -0.2529444 0.5792413 -0.001950741 0.2661068 1.17515099 0.48753778 1.2560402 0.1354533 1.440160168 -2.1295397 2.05025701 1.03772839 0.8123061 0.4453768 1.598246016 0.7146553 -1.09476532 0.060066510 0.1084029 -0.4934862 -0.584671816 -0.8096653 1.54466019 -1.811745911 -0.8152812 0.9494620 0.100909570 1.5944528 1.56724269 0.683995412 0.3130357 2.6245864 1.750448404 -0.7494403 1.06055267 1.035826713 1.1976817 -1.2110708 0.719397607 -0.2690107 0.83364274 -0.689593614 -2.1860098 -0.8488031 -0.302743475 -0.7348443 0.34302096 -0.802480315 0.2361756 0.6773727 1.279737692 0.8742478 -0.03064782 -0.487417216 -1.5634527 -0.8276335 0.753090683 2.0394865 0.79006103 0.5704210I'm after something like this X1 X21 -0.28358147 -0.400671282 0.50608365 1.275134713 -0.07950691 -0.225629574 -0.52542633 0.411031395 0.37758930 -0.460933406 -0.11784382 0.605145867 0.10811540 0.642931848 0.94388455 0.319481899 0.95197629 -0.1066811810 -0.32325169 -0.3589170211 0.07836345 1.2818969812 1.56269017 0.4489797113 0.23533617 -0.0416538414 -1.11251880 -0.3981012115 0.73109533 0.1187275816 -0.54599850 1.13332286which I did with this, but is obviously no good for my much larger data frame...data.frame(cbind(apply(cbind(dat$a, dat$b, dat$c), 1, mean),apply(cbind(dat$d, dat$e, dat$f), 1, mean)))I've tried apply and loops and can't quite get it together. My actual data has some hundreds of columns. 解决方案 This may be more generalizable to your situation in that you pass a list of indices. If speed is an issue (large data frame) I'd opt for lapply with do.call rather than sapply:x <- list(1:3, 4:6)do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))Works if you just have col names too:x <- list(c('a','b','c'), c('d', 'e', 'f'))do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))EDITJust happened to think maybe you want to automate this to do every three columns. I know there's a better way but here it is on a 100 column data set:dat <- data.frame(matrix(rnorm(16*100), ncol=100))n <- 1:ncol(dat)ind <- matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=TRUE, ncol=3)ind <- data.frame(t(na.omit(ind)))do.call(cbind, lapply(ind, function(i) rowMeans(dat[, i])))EDIT 2Still not happy with the indexing. I think there's a better/faster way to pass the indexes. here's a second though not satisfying method:n <- 1:ncol(dat)ind <- data.frame(matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=F, nrow=3))nonna <- sapply(ind, function(x) all(!is.na(x)))ind <- ind[, nonna]do.call(cbind, lapply(ind, function(i)rowMeans(dat[, i]))) 这篇关于对一组列应用函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 17:54