r - 使用data.table汇总每月序列(计算特定事件)

我希望这是一个可以接受的R / data.table问题。

我有一个三列的表，其中:

id地理位置ID(303,453个位置)

month 25年以上的月份1990-2014

spei在-7到7之间变化的气候指数。

我需要计算整个1990-2014年期间每个位置的干旱发生情况。干旱事件被定义为“SPII持续为负且SPEI达到-1.0或更小的值的时期。当SPEI首次降至零以下时，干旱开始，直到值-之后，第一个正SPEI值结束。 1.0以下”。

我知道使用shift()和滚动联接应该可行，但非常欢迎提供帮助!

# Sample table structure
dt <- data.table(
  id = rep(1:303453, each=25*12),
  month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), 303453),
  spei = runif(303453*25*12, -7, 7))

# A minimal example with 1 location over 12 months
library(data.table)
library(xts)

dt <- data.table(
  id = rep("loc1", each=12),
  month = seq(as.Date("2014-01-01"), as.Date("2014-12-31"), "month"),
  spei = c(-2, -1.1, -0.5, 1.2, -1.2, 2.3, -1.7, -2.1, 0.9, 1.2, -0.9, -0.2))

spei.ts <- xts(dt$spei, order.by=dt$month, frequency="month")
plot(spei.ts, type="bars")

这显示了1年期间发生了3次干旱事件。这是我需要识别和计算的。

希望你们中的一些人习惯于使用时间序列。
非常感谢，-梅尔。

最佳答案

这是获得所需结果的起点。
专家可能会建议您提高速度。

编辑:通过删除paste提高了约8倍的速度。

library(data.table)
set.seed(42)
n <- 300  # 303453 will be ~1000 times slower
dt <- data.table(
    id = rep(1:n, each=25*12),
    month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), n),
    spei = runif(n*25*12, -7, 7))

system.time({
  dt[, `:=`(neg = (spei < 0), neg1 = (spei <= -1))]
  dt[, runid := ifelse(neg, rleid(neg), NA)]
  res <- dt[!is.na(runid),
            .(length = .N[any(neg1)], start = min(month), end = max(month)),
            by = .(id, runid)][!is.na(length)]

})
#    user  system elapsed
#   0.345   0.000   0.344

# counts of droughts per id:
res[, .(nDroughts = .N), by = id]

# list of droughts per id: (NB: don't include 1st positive value after)
res[, .(droughtN = seq_len(.N), start, end), by = id]