R聚合在函数中有多个参数

本文介绍了R聚合在函数中有多个参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！通过在数据框架上使用聚合来避免使用循环的时间。但是我需要一个列的值进入最终计算。 dat rate = c（0.5,0.4,1,0.6）， v1 = c（4,0,3 ，1）， v2 = c（2,0,9,4）） > dat 密钥率v1 v2 1 a 0.5 4 2 2 b 0.4 0 0 3 a 1.0 3 9 4 b 0.6 1 4 aggregate（dat [， - 1]，list（key = dat $ key ）函数（x，y = dat $ rate）{ rate values return （sum（value * rate）/ sum（rates））}）注意：这个功能只是一个例子！这个实现的问题是， y = dat $ rate 给出了dat上的所有4个费率，当我想要只是2个汇总率！ Anny sugestion我该怎么做？谢谢！解决方案这是我设法实现的，使用 data.table 包： DT< - data.table（dat，key =key DT [，list（v1 = sum（rate * v1）/ sum（rate），v2 = sum（rate * v2）/ sum（rate）），by =key] ＃key v1 v2 ＃1：a 3.333333 6.666667 ＃2：b 0.600000 2.400000 好的。所以这很容易写出两个变量，但是当我们有更多的列时呢。使用 lapply（.SD，...）结合您的功能：首先，一些数据： / p> set.seed（1） dat< - data.frame（key = rep（c（a ，b），times = 10）， rate = runif（20，min = 0，max = 1）， v1 = sample（10，20，replace = TRUE）， v2 = sample（20,20，replace = TRUE）， v3 = sample（30,20，replace = TRUE）， x1 = sample（5,20，replace = TRUE）， x2 = sample（6:10，20，replace = TRUE）， x3 = sample（11:15，20，replace = TRUE））库（data.table） datDT< - data.table（dat，key =key） datDT ＃密钥速率v1 v2 v3 x1 x2 x3 ＃1：a 0.26550866 10 17 28 3 9 15 ＃2：a 0.57285336 7 16 14 2 7 13 ＃3：a 0.20168193 3 11 20 4 9 14 ＃4：a 0.94467527 1 1 15 4 6 13 ＃5 ：a 0.62911404 9 15 3 2 10 12 ＃6：a 0.20597457 5 10 11 2 10 13 ＃7：a 0.68702285 5 9 11 4 7 11 ＃8：a 0.76984142 9 2 15 4 6 15 ＃9：a 0.71761851 8 7 26 3 9 13 ＃10：a 0.38003518 8 14 24 5 8 15 ＃11：b 0.37212390 3 13 9 4 7 13 ＃12：b 0.90820779 2 12 10 2 10 11 ＃13：b 0.89838968 4 16 8 2 7 13 ＃14：b 0.66079779 4 10 23 1 8 12 ＃15：b 0.06178627 4 14 27 1 8 13 ＃16：b 0.17655675 6 18 26 1 9 11 ＃17：b 0.38410372 2 5 11 5 8 14 ＃18：b 0.49769924 7 2 27 4 6 13 ＃19：b 0.99190609 2 11 12 3 6 13 ＃20：b 0.77744522 5 9 29 4 9 13 二，聚合： / p> datDT [，lapply（.SD，function（x，y = rate）sum（y * x）/ sum（y）），by =key] ＃key rate v1 v2 v3 x1 x2 x3 ＃1：a 0.6501303 6.335976 8.634691 15.75915 3.363832 7.658762 13.19152 ＃2：b 0.7375793 3.595585 10.749705 16.26582 2。 792390 7.741787 12.57301 如果您有一个非常大的数据集，您可能需要探索 data.table 一般来说对于什么是值得的，我也是成功的在基地R，但我不知道这会有多高效，特别是因为转置等等。 t （i（i，i））中的（i，i，b，b） 1：length（y））{ V1 [i]< - sum（x [2] * x [y [i]]）/ sum（x [2]）} }））＃[，1] [，2] [，3] [，4] [，5] [，6] ＃a 6.335976 8.634691 15.75915 3.363832 7.658762 13.19152 ＃b 3.595585 10.749705 16.26582 2.792390 7.741787 12.57301 Im tryng to avoid a time consuming for loop by using an aggregate on a data.frame. But I need that the values of one of the columns enters in the final computation.dat <- data.frame(key = c('a', 'b', 'a','b'),rate = c(0.5,0.4,1,0.6),v1 = c(4,0,3,1),v2 = c(2,0,9,4))>dat key rate v1 v21 a 0.5 4 22 b 0.4 0 03 a 1.0 3 94 b 0.6 1 4aggregate(dat[,-1], list(key=dat$key), function(x, y=dat$rate){ rates <- as.numeric(y) values <- as.numeric(x) return(sum(values*rates)/sum(rates)) })Note: The function is just an example!The problem of this implementation is that y=dat$rate gives all 4 rates on dat, when what I want is just the 2 aggregated rates!Anny sugestion on how I could do this?Thanks! 解决方案 Here's what I managed to achieve, using the "data.table" package:DT <- data.table(dat, key = "key")DT[, list(v1 = sum(rate * v1)/sum(rate), v2 = sum(rate * v2)/sum(rate)), by = "key"]# key v1 v2# 1: a 3.333333 6.666667# 2: b 0.600000 2.400000OK. So that's easy to write out for just two variables, but what about when we have a lot more columns. Use lapply(.SD,...) in conjunction with your function:First, some data:set.seed(1)dat <- data.frame(key = rep(c("a", "b"), times = 10), rate = runif(20, min = 0, max = 1), v1 = sample(10, 20, replace = TRUE), v2 = sample(20, 20, replace = TRUE), v3 = sample(30, 20, replace = TRUE), x1 = sample(5, 20, replace = TRUE), x2 = sample(6:10, 20, replace = TRUE), x3 = sample(11:15, 20, replace = TRUE))library(data.table)datDT <- data.table(dat, key = "key")datDT# key rate v1 v2 v3 x1 x2 x3# 1: a 0.26550866 10 17 28 3 9 15# 2: a 0.57285336 7 16 14 2 7 13# 3: a 0.20168193 3 11 20 4 9 14# 4: a 0.94467527 1 1 15 4 6 13# 5: a 0.62911404 9 15 3 2 10 12# 6: a 0.20597457 5 10 11 2 10 13# 7: a 0.68702285 5 9 11 4 7 11# 8: a 0.76984142 9 2 15 4 6 15# 9: a 0.71761851 8 7 26 3 9 13# 10: a 0.38003518 8 14 24 5 8 15# 11: b 0.37212390 3 13 9 4 7 13# 12: b 0.90820779 2 12 10 2 10 11# 13: b 0.89838968 4 16 8 2 7 13# 14: b 0.66079779 4 10 23 1 8 12# 15: b 0.06178627 4 14 27 1 8 13# 16: b 0.17655675 6 18 26 1 9 11# 17: b 0.38410372 2 5 11 5 8 14# 18: b 0.49769924 7 2 27 4 6 13# 19: b 0.99190609 2 11 12 3 6 13# 20: b 0.77744522 5 9 29 4 9 13Second, aggregate:datDT[, lapply(.SD, function(x, y = rate) sum(y * x)/sum(y)), by = "key"]# key rate v1 v2 v3 x1 x2 x3# 1: a 0.6501303 6.335976 8.634691 15.75915 3.363832 7.658762 13.19152# 2: b 0.7375793 3.595585 10.749705 16.26582 2.792390 7.741787 12.57301If you have a really large dataset, you might want to explore data.table in general.For what it is worth, I was also successful in base R, but I'm not sure how efficient this would be, particularly because of the transposing and so on.t(sapply(split(dat, dat[1]), function(x, y = 3:ncol(dat)) { V1 <- vector() for (i in 1:length(y)) { V1[i] <- sum(x[2] * x[y[i]])/sum(x[2]) } V1 }))# [,1] [,2] [,3] [,4] [,5] [,6]# a 6.335976 8.634691 15.75915 3.363832 7.658762 13.19152# b 3.595585 10.749705 16.26582 2.792390 7.741787 12.57301 这篇关于R聚合在函数中有多个参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！