我正在尝试计算我的数据的平均值,但我在两件事上挣扎:1. 获得正确的布局和 2. 在结果中包括缺失值。

#My input data:
Stock <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B")
Soil <- c("Blank", "Blank", "Control", "Control", "Clay", "Clay", "Blank", "Blank", "Control", "Control", "Clay", "Clay")
Nitrogen <- c(NA, NA, 0, 0, 20, 20, NA, NA, 0, 0, 20, 20)
Respiration <- c(112, 113, 124, 126, 139, 137, 109, 111, 122, 124, 134, 136)
d <- as.data.frame(cbind(Stock, Soil, Nitrogen, Respiration))

#The outcome I'd like to get:
Stockr <- c("A", "A", "A", "B", "B", "B")
Soilr <- c("Blank", "Control", "Clay", "Blank", "Control", "Clay")
Nitrogenr <- c(NA, 0, 20, NA, 0, 20)
Respirationr <- c(111, 125, 138, 110, 123, 135)
result <- as.data.frame(cbind(Stockr, Soilr, Nitrogenr, Respirationr))

非常感谢您的帮助!

最佳答案

这是 ddply 包中的 plyr 的解决方案:

library(plyr)
ddply(d, .(Stock, Soil, Nitrogen), summarise,
      Respiration = mean(as.numeric(as.character(Respiration))))

#   Stock    Soil Nitrogen Respiration
# 1     A   Blank     <NA>       112.5
# 2     A    Clay       20       138.0
# 3     A Control        0       125.0
# 4     B   Blank     <NA>       110.0
# 5     B    Clay       20       135.0
# 6     B Control        0       123.0

请注意,cbind 不是创建数据框的好方法。您应该改用 data.frame(Stock, Soil, Nitrogen, Respiration)。由于您的方法,d 的所有列都是因子。我使用 as.numeric(as.character(Respiration)) 来获取该列的数值。

关于r - 聚合和缺失值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21510708/

10-12 23:21