问题描述
我喜欢使用 ddply
编写一个函数,它基于两列 data.frame
mat
。
-
mat
是一个大的数据。框架
,列名为metric,length,species,tree,...,index
/ p>
-
索引
是2级的因子短,长 / code>
-
metric,length,species,tree c>和其他都是连续变量
功能:
summary1< - function(arg1,arg2){
...
ss< - ddply(mat,。(index) function(X)data.frame(
arg1 = as.list(summary(X $ arg1)),
arg2 = as.list(summary(X $ arg2)),
.parallel = FALSE)
ss
}
我预计输出调用 summary1(metric,length)后看起来像这样。
short metric.Min。metric.1st.Qu。metric.Median metric.Mean metric.3rd.Qu。metric.Max。length.Min。length.1st.Qu。length
.Me平均长度。长
....
长度。 metric.1st.Qu。公制度量。公尺。长度。 length.1st.Qu。长度
.Median length.Mean length.3rd.Qu。长
....
此时该函数不产生期望输出?在这里应该做什么修改?
感谢您的帮助。
这是一个玩具示例
code> mat< - data.frame(
metric = rpois(10,10),length = rpois(10,10),species = rpois(10,10),
tree = rpois(10,10),index = c(rep(Short,5),rep(Long,5))
)
解决方案
如果你想要很好的命名输出,我提出以下解决方案:
summary1< - function(arg1,arg2){
ss< - ddply(mat,。(index),function(X)data.frame(
setNames(
list .list(summary(X [[arg1]])),as.list(summary(X [[arg2]]))),
c(arg1,arg2)
)),.parallel = FALSE )
ss
}
summary1(metric,length)
玩具数据的输出是:
index metric.Min。 metric.1st.Qu。公制度量。
1长5 7 10 8.6 10
2短7 7 9 8.8 10
metric.Max。长度。 length.1st.Qu。长度。
1 11 9 10 11 10.8 12
2 11 4 9 9 9.0 11
length.Max。
1 12
2 12
I like to write a function using ddply
that outputs the summary statistics based on the name of two columns of data.frame
mat
.
mat
is a big data.frame
with the name of columns "metric", "length", "species", "tree", ...,"index"
index
is factor with 2 levels "Short", "Long"
"metric", "length", "species", "tree"
and others are all continuous variables
Function:
summary1 <- function(arg1,arg2) {
...
ss <- ddply(mat, .(index), function(X) data.frame(
arg1 = as.list(summary(X$arg1)),
arg2 = as.list(summary(X$arg2)),
.parallel = FALSE)
ss
}
I expect the output to look like this after calling summary1("metric","length")
Short metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.
....
Long metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.
....
At the moment the function does not produce the desired output? What modification should be made here?
Thanks for your help.
Here is a toy example
mat <- data.frame(
metric = rpois(10,10), length = rpois(10,10), species = rpois(10,10),
tree = rpois(10,10), index = c(rep("Short",5),rep("Long",5))
)
解决方案 As Nick wrote in his answer you can't use $
to reference variable passed as character name. When you wrote X$arg1
then R
search for column named "arg1"
in data.frame
X
. You can reference to it either by X[,arg1]
or X[[arg1]]
.
And if you want nicely named output I propose below solution:
summary1 <- function(arg1, arg2) {
ss <- ddply(mat, .(index), function(X) data.frame(
setNames(
list(as.list(summary(X[[arg1]])), as.list(summary(X[[arg2]]))),
c(arg1,arg2)
)), .parallel = FALSE)
ss
}
summary1("metric","length")
Output for toy data is:
index metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu.
1 Long 5 7 10 8.6 10
2 Short 7 7 9 8.8 10
metric.Max. length.Min. length.1st.Qu. length.Median length.Mean length.3rd.Qu.
1 11 9 10 11 10.8 12
2 11 4 9 9 9.0 11
length.Max.
1 12
2 12
这篇关于使用ddply进行汇总统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!