具有不同长度的向量的绑定列表

本文介绍了具有不同长度的向量的绑定列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是R的新手，我正在尝试建立频率/严重性仿真.一切工作正常，除了需要花费大约10分钟的时间对700个位置进行10000个仿真.为了模拟一个单独的位置，我得到了一个长度可变的向量列表，我想有效地绑定这些向量，并为所有不存在的值填写NA.我希望R将data.frame返回给我.到目前为止，在将列表中的向量转换为1行矩阵之后，我使用了rbind.fill.matrix.但是，我希望可以使用诸如bind_rows(dplyr)或rbindfill之类的东西，但是我不知道如何将向量转换为可用于这些功能的东西.预先感谢您的帮助！

I am new to R and I am trying to build a frequency/severity simulation. Everything is working fine except that it takes about 10min to do 10000 simulations for each of 700 locations.For the simulation of one individual location, I got a list of vectors with varying lengths and I would like to efficiently rbind these vectors, filling in NAs for all non-existing values. I would like R to return a data.frame to me.So far, I used rbind.fill.matrix after converting the vectors in the list to matrices of 1 row. However, I am hoping that I could use something like bind_rows (dplyr) or rbindfill but I don't know how to transform the vectors into something that I could use for these functions. Thank you in advance for your help!

set.seed(1223)

library(data.table)

numsim = 10

rN.D <- function(numsim) rpois(numsim, 4)
rX.D <- function(numsim) rnorm(numsim, mean = 5, sd = 4)

freqs <- rN.D(numsim)
obs <- lapply(freqs, function(x) rX.D(x))
#obs is the list that I would like to rbind (efficiently!) and have a data.frame returned to me

推荐答案

如果您的实际应用程序使用 rnorm 或类似的代码，则可以对其进行一次调用:

If your real application uses rnorm or similar, you can make a single call to it:

set.seed(1223)
numsim = 3e5
freqs = rN.D(numsim)
maxlen = max(freqs)
m = matrix(, maxlen, numsim)
m[row(m) <= freqs[col(m)]] <- rX.D(sum(freqs))

res = as.data.table(t(m))

我以错误的方式"(每次模拟都在列而不是行)填充数据，然后转置，因为R使用"主要列订单.

I am filling the data the "wrong way" (with each simulation on a column instead of a row) and then transposing since R fills matrix values using "column-major" order.

如果您需要使用 lapply ，这是最后一步的基准:

If you need to use lapply, here's a benchmark for the final step:

set.seed(1223)

library(dplyr); library(tidyr); library(purrr)
library(data.table)

numsim = 3e5

rN.D <- function(numsim) rpois(numsim, 4)
rX.D <- function(numsim) rnorm(numsim, mean = 5, sd = 4)

freqs <- rN.D(numsim)
obs <- lapply(freqs, function(x) rX.D(x))

system.time({
tidyres = obs %>%
   set_names(seq_along(.)) %>%
   stack %>%
   group_by(ind) %>%
   mutate(Col = paste0("Col", row_number())) %>%
   spread(Col, values)
})
#    user  system elapsed
#   16.56    0.31   16.88

system.time({
    out <- do.call(rbind, lapply(obs, `length<-`, max(lengths(obs))))
    bres = as.data.frame(out)
})
#    user  system elapsed
#    0.50    0.05    0.55

system.time(
    dtres <- setDT(transpose(obs))
)
#    user  system elapsed
#    0.03    0.01    0.05

与其他两种方法相比(从@akrun的答案中得出)，最后一种方法最快.

The last approach is fastest compared to the other two (both from @akrun's answer).

评论.我建议仅使用data.table或tidyverse.混合和匹配将很快变得混乱.当我设置此示例时，我看到 purrr 具有它自己的 transpose 函数，因此，如果以不同的顺序加载软件包，则这样的代码可以给出不同的结果而不会警告.

Comment. I would recommend using only data.table or tidyverse. Mixing and matching will get messy very quickly. When I was setting this example up, I saw that purrr has it's own transpose function, so if you loaded packages in a different order, code like this can give different results without warning.

这篇关于具有不同长度的向量的绑定列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

With