问题描述
只是想知道是否有人遇到过他/她需要从非常高维的多元正态分布(比如维数 = 10,000)中随机抽取的问题,作为 rmvnorm
函数>mvtnorm 包是不切实际的.
Just wondering if anyone has ever encountered the problem where he/she needs to randomly draw from a very high dimensional multivariate normal distribution (say dimension = 10,000), as the rmvnorm
function of the mvtnorm
package is impractical for that.
我知道这篇文章有一个Rcpp
实现mvnorm
包的 dmvnorm
函数,所以我想知道 rmvnorm
是否存在等效的东西?
I know this article has an Rcpp
implementation for the dmvnorm
function of the mvtnorm
package, so I was wondering if something equivalent exists for rmvnorm
?
推荐答案
这里是 mvtnorm::rmvnorm
和 Rcpp
实现的快速比较 此处 作者:Ahmadou Dicko.显示的时间是从多元正态分布中抽取 100 次的时间,维度范围从 500 到 2500.从下图中您可以推断出维度为 10000 所需的时间.时间包括生成随机 mu的开销code> 向量和
diag
矩阵,但这些在不同方法中是一致的,并且对于所讨论的维度来说是微不足道的(例如 diag(10000)
为 0.2 秒).
Here's a quick comparison of mvtnorm::rmvnorm
and an Rcpp
implementation given here by Ahmadou Dicko. The times presented are for 100 draws from a multivariate normal distribution with dimension ranging from 500 to 2500. From the graph below you can probably infer the time required for dimension of 10000. Times include the overhead of generating the random mu
vector and the diag
matrix, but these are consistent across approaches and are trivial for the dimensions in question (e.g. 0.2 sec for diag(10000)
).
library(Rcpp)
library(RcppArmadillo)
library(inline)
library(mvtnorm)
code <- '
using namespace Rcpp;
int n = as<int>(n_);
arma::vec mu = as<arma::vec>(mu_);
arma::mat sigma = as<arma::mat>(sigma_);
int ncols = sigma.n_cols;
arma::mat Y = arma::randn(n, ncols);
return wrap(arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma));
'
rmvnorm.rcpp <-
cxxfunction(signature(n_="integer", mu_="numeric", sigma_="matrix"), code,
plugin="RcppArmadillo", verbose=TRUE)
rcpp.time <- sapply(seq(500, 5000, 500), function(x) {
system.time(rmvnorm.rcpp(100, rnorm(x), diag(x)))[3]
})
mvtnorm.time <- sapply(seq(500, 2500, 500), function(x) {
system.time(rmvnorm(100, rnorm(x), diag(x)))[3]
})
plot(seq(500, 5000, 500), rcpp.time, type='o', xlim=c(0, 5000),
ylim=c(0, max(mvtnorm.time)), xlab='dimension', ylab='time (s)')
points(seq(500, 2500, 500), mvtnorm.time, type='o', col=2)
legend('topleft', legend=c('rcpp', 'mvtnorm'), lty=1, col=1:2, bty='n')
这篇关于从多元正态分布中有效地随机抽取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!