我对 OLapCube 包 data.cube 有一些问题:

install.packages("data.cube", repos = paste0("https://", c(
    "jangorecki.gitlab.io/data.cube",
    "cloud.r-project.org"
)))

部分测试数据:
 library(data.table)
 set.seed(42)

 dt <- CJ(color = c("green","yellow","red"),
            year = 2011:2015,
            month = 1:12,
            status = c("active","inactive","archived","removed")
 )[sample(600)]

 dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]

现在我想创建一个多维数据集并在时间维度上应用层次结构。像这样的东西:
library(data.cube)
dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"),
                   measure.vars = "value",
                   hierarchies = list(time <- list("year, month")))

如果我运行此代码,我会收到错误消息:
Error in as.data.cube.data.table(dt, id.vars = c("color", "year", "month",  :
  identical(names(hierarchies), id.vars) | identical(names(hierarchies),  .... is not TRUE

如果我尝试类似的东西
hierarchies = list(time <- list("year, month"), color <- list("color"),
                  status <- list("status"))

我犯了同样的错误。

最佳答案

写得很好的问题。
我看到您根据 ?as.data.cube 示例制作了示例,因此我也会尝试使用该示例来回答您的问题

# Original example goes as follows
library(data.cube)
library(data.table)
set.seed(1L)
dt = CJ(color = c("green","yellow","red"),
        year = 2011:2015,
        status = c("active","inactive","archived","removed"))[sample(30)]
dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]

dc = as.data.cube(
  x = dt, id.vars = c("color","year","status"),
  measure.vars = "value",
  hierarchies = sapply(c("color","year","status"),
                       function(x) list(setNames(list(character()), x)),
                       simplify=FALSE)
)
str(dc)

检查层次结构的有效性时,您的错误似乎被提出。
不幸的是,这不是很有意义的错误,我创建了问题 #18,所以有一天会得到改进。
因此,让我们比较手册中的层次结构和示例中创建的层次结构。
sapply(c("color","year","status"),
       function(x) list(setNames(list(character()), x)),
       simplify=FALSE) -> h
str(h)
#List of 3
# $ color :List of 1
#  ..$ :List of 1
#  .. ..$ color: chr(0)
# $ year  :List of 1
#  ..$ :List of 1
#  .. ..$ year: chr(0)
# $ status:List of 1
#  ..$ :List of 1
#  .. ..$ status: chr(0)

hierarchies = list(time <- list("year, month"), color <- list("color"),
                   status <- list("status"))
str(hierarchies)
#List of 3
# $ :List of 1
#  ..$ : chr "year, month"
# $ :List of 1
#  ..$ : chr "color"
# $ :List of 1
#  ..$ : chr "status"

我们可以看到手册中的层次结构是命名元素的列表,而您的示例是未命名元素的列表。
我相信你在应该使用 <- 的地方误用了 =<- 并不总是等于 = 运算符。您可以在 3.1.3.1 Assignment <- vs = 中阅读有关这种情况的更多信息。

所以让我们看看修复是否足够
hierarchies = list(time = list(c("year, month")), color = list("color"),
                   status = list("status"))

dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"),
                   measure.vars = "value",
                   hierarchies = hierarchies)

我们仍然有同样的错误,所以需要命名,而这不是问题的根本原因。仔细观察后,我现在看到您想要构建没有主键的时间维度。
重要说明,您不能将多个列名作为单个字符串传递,因此
"year, month"

应该写成
c("year","month")

我们仍然需要时间维度主键是单个字段,其中年和月将只是属性。
因此,让我们为时间维度创建主键,因为我们的时间维度具有年-月粒度,我们将在该粒度上创建键。
library(data.table)
set.seed(42)

dt <- CJ(color = c("green","yellow","red"),
         year = 2011:2015,
         month = 1:12,
         status = c("active","inactive","archived","removed")
)[sample(600)
  ][, yearmonth:=sprintf("%04d%02d", year, month) # this ensure four numbers for year and 2 numbers for month
    ]

dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]

现在让我们做层次结构,注意 year 已更改为 yearmonth
在下面的层次结构中,值向量 c("year","month") 意味着这些属性依赖于 yearmonth 。有关更复杂的层次结构情况,请参阅 ?as.data.cube 中的更多示例。
hierarchies = list(
  color = list(color = list(color = character())),
  yearmonth = list(yearmonth = list(yearmonth = c("year","month"))),
  status = list(status = list(status = character()))
)

dc = as.data.cube(
  x = dt, id.vars = c("color","yearmonth","status"),
  measure.vars = "value",
  hierarchies = hierarchies
)
str(dc)

我们的 data.cube 已经成功创建。让我们尝试使用 yearmonth 的键来查询它
dc[, .(yearmonth=201105L)] -> d
as.data.table(d)
dc[, .(yearmonth=201105L), drop=FALSE] -> d
as.data.table(d)

现在尝试使用维度、年和月的属性来查询它
dc[, .(year=2011L)] -> d
as.data.table(d) # note that dimension is not being dropped because it still have more than 1 value
dc[, .(month=5L)] -> d
as.data.table(d)
dc[, .(year=2011L, month=5L)] -> d
as.data.table(d) # here dimension has been dropped because there was only single element in that dimension, you can of course use `drop=FALSE` if needed.

希望能帮到你,祝你好运!

关于R 数据立方体定义层次结构,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52816087/

10-12 17:46