问题描述
在使用自定义函数循环数据框中的组时遇到一些问题.
Having some trouble getting a custom function to loop over a group in a data frame.
以下是一些示例数据:
set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))
df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])
我创建了一个自定义函数来计算 w 值:
I created a custom function to calculate the value w:
calc <- function(x) {
data <- x
w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
w
}
当我在整个数据集上运行该函数时,我得到以下答案:
When I run the function on the entire data set, I get the following answer:
calc(df)
[1]1.664474
理想情况下,我想返回按 tm 分组的结果,例如:
Ideally, I want to return results that are grouped by tm, e.g.:
tm w
1 result of calc
2 result of calc
3 result of calc
到目前为止,我已尝试将 aggregate
与我的函数一起使用,但出现以下错误:
So far I have tried using aggregate
with my function, but I get the following error:
aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors
我觉得我盯着这个看得太久了,有一个明显的答案.任何建议将不胜感激.
I feel like I have stared at this too long and there is an obvious answer. Any advice would be appreciated.
推荐答案
Using dplyr
library(dplyr)
df %>%
group_by(tm) %>%
do(data.frame(val=calc(.)))
# tm val
#1 1 1.665882
#2 2 1.504545
#3 3 1.838000
如果我们稍微改变函数以包含多个参数,这也适用于 summarise
If we change the function slightly to include multiple arguments, this could also work with summarise
calc1 <- function(d1, t1, h1, p1){
(1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
df %>%
group_by(tm) %>%
summarise(val=calc1(d, t, h, p))
# tm val
#1 1 1.665882
#2 2 1.504545
#3 3 1.838000
这篇关于按组在 R 中的数据框上运行自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!