本文介绍了在 R 中分组,ddply 与 weighted.mean的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 R 中进行分组依据"- 样式加权平均值.使用一些基本平均值,以下代码(使用 Hadley 的 plyr 包)运行良好.

I am trying to do a "group by" - style weighted mean in R. With some basic mean the following code (using the plyr package from Hadley) worked well.

ddply(mydf,.(period),mean)

如果我对 weighted.mean 使用相同的方法,我会收到以下错误'x' 和 'w' 必须具有相同的长度",我不明白这是因为 weighted.mean 部分在 ddply 之外工作.

If I use the same approach with weighted.mean i get the following error "'x' and 'w' must have the same length" , which I do not understand because the weighted.mean part works outside ddply.

weighted.mean(mydf$mycol,mydf$myweight) # works just fine
ddply(mydf,.(period),weighted.mean,mydf$mycol,mydf$myweight) # returns the erros described above
ddply(mydf,.(period),weighted.mean(mydf$mycol,mydf$myweight)) # different code same story

我想写一个自定义函数而不是使用 weighted.mean 然后将它传递给 ddply 甚至用子集从头开始写一些新的东西.在我的情况下,希望它会做太多工作,但是应该有一个更聪明的解决方案,其中已经存在.

I thought of writing a custom function instead of using weighted.mean and then passing it to ddply or even writing something new from scratch with subset. In my case it would be too much work hopefully, but there should by a smarter solution with what´s already there.

如有任何建议,请提前感谢!

thx for any suggestions in advance!

推荐答案

使用匿名函数:

> ddply(iris,"Species",function(X) data.frame(wmn=weighted.mean(X$Sepal.Length,
+                                                               X$Petal.Length),
+                                             mn=mean(X$Sepal.Length)))
     Species      wmn    mn
1     setosa 5.016963 5.006
2 versicolor 5.978075 5.936
3  virginica 6.641535 6.588
>

这将计算 Sepal.Length(由 Petal.Length 加权)以及未加权的均值的加权平均值并返回两者.

This computes a weighted mean of Sepal.Length (weighted by Petal.Length) as well as unweighted mean and returns both.

这篇关于在 R 中分组,ddply 与 weighted.mean的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 04:01