我有以下xts对象。

x <- structure(c(30440.5, 30441, 30441.5, 30441.5, 30441, 30439.5, 30440.5, 30441,
                 30441.5, NA, NA, 30439.5, NA, NA, NA, 30441.5, 30441, NA), .indexTZ = "",
               class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"),
               tclass = c("POSIXct", "POSIXt"), tzone = "",
               index = structure(c(1519866931.1185, 1519866931.1255, 1519866931.1255,
                                   1519866931.1905, 1519866931.1905, 1519866931.1915),
                                 tzone = "", tclass = c("POSIXct", "POSIXt")),
               .indexFormat = "%Y-%m-%d %H:%M:%OS",
               .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "y", "z")))
#                              x        y        z
# 2018-03-01 09:15:31.118  30440.5  30440.5       NA
# 2018-03-01 09:15:31.125  30441.0  30441.0       NA
# 2018-03-01 09:15:31.125  30441.5  30441.5       NA
# 2018-03-01 09:15:31.190  30441.5       NA  30441.5
# 2018-03-01 09:15:31.190  30441.0       NA  30441.0
# 2018-03-01 09:15:31.191  30439.5  30439.5       NA


如何编写vapply以获取mean(..., na.rm = TRUE)跨行的均值,以使其返回这样的单列?

                               w
2018-03-01 09:15:31.118  30440.5
2018-03-01 09:15:31.125  30441.0
2018-03-01 09:15:31.125  30441.5
2018-03-01 09:15:31.190  30441.5
2018-03-01 09:15:31.190  30441.0
2018-03-01 09:15:31.191  30439.5


我只是无法正常工作。

我注意到许多答案都建议我不要使用vapply而是使用其他功能。但是,根据此answervapply实际上是最快的。那么哪个apply功能在这里最好呢?

最佳答案

如果您希望每一行的列均值,则不会使用vapply。我将使用rowMeans,并注意您必须将结果转换回xts。

(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5


而且我会将apply用于没有专门实现的通用函数。请注意,如果函数返回多个值,则需要转置结果。

(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
#                        [,1]    [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5


为了解决“为什么不使用vapply()”的注释,这里有一些基准(使用OP链接到的代码审查Q / A中的数据):

set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
          order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
    mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }

library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
#          expr       min         lq       mean     median         uq       max neval
#  xrowmean(xz)   169.496   188.8505   207.1931   204.2455   219.4945   285.329   100
#    xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353   100
#   xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961   100


那么,为什么不使用vapply()?它并不会增加性能优势。它比apply()版本更为冗长,并且尚不清楚如果您可以控制对象的类型和所调用的函数,则“预先指定的返回值”的安全性会带来很多好处。也就是说,使用vapply()不会对您造成任何伤害。对于这种情况,我只是更喜欢apply()

08-18 05:07