问题描述
我有一个积分要评估
"x^(-0.5)" ; x in [0.01,1]
我正在使用重要性采样 MC :该理论说必须使用近似的 PDF 来计算期望值(几乎肯定会收敛到积分的平均值)
for which I am using Importance Sampling MC :The theory says that an approximate PDF has to be used to compute the expected value (which will almost surely converge to the mean - value of the integral)
在绘制给定的积分和指数 PDF 后,仅基于图,我选择了rexp
和 dexp
生成 PDF - 我的代码看起来像这样 -
After plotting the given integral, and exponential PDF, based only on the plots, I chose the rexp
and dexp
to generate the PDF - and my code looks like this -
#Without Importance Sampling
set.seed(1909)
X <- runif(1000,0.01,1)
Y <- X^(-0.5)
c( mean(Y), var(Y) )
#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dexp(x,rate=1.5)
f <- function(x) x^(-0.5)
X= rexp(1000,rate=1.5)
Y=w(X)*f(X)
c( mean(Y), var(Y) )
有人可以确认我的想法是否正确吗?如果错了,我应该如何处理这个问题?请阐明 - 我已经理解了理论,但事实证明对我来说实施是有问题的.
Could someone please confirm if my line of thought is correct?If wrong, how differently am I supposed to approach this?Please elucidate - I have understood the theory but implementation is proving to be problematic for me.
对于不那么简单的积分,
For integrals that are not so simple,
1.) f(x) = [1+sinh(2x)ln(x)]^-1我选择正常 PDF = g(x)(均值 = 0.5 和 SD = 5)作为近似值,仅在观察到绘图后.我写了一个类似于它的代码,但它说 NaN 是在重要性采样的情况下产生的.(这在理想情况下意味着未定义的函数,但我不知道如何解决这个问题).
1.) f(x) = [1+sinh(2x)ln(x)]^-1I chose the normal PDF = g(x) (with mean = 0.5 and SD = 5) as approximate only after observing the plot. I wrote a code similar to the one for it , but it says NaN's produced in case of importance sampling. (this ideally means undefined function but I don't know how to solve this).
2.) f(x,y) = exp(-x^4 - y^4)
2.) f(x,y) = exp(-x^4 - y^4)
如何为上述函数选择g(x,y)?
推荐答案
总的来说,您的方法似乎是正确的,但您必须更加小心要集成的域.在您的原始示例中,大约 20% 的值 rexp(1000, 1.5)
大于 1.函数 dexp(x, rate=1.5)
不是密度函数区间 [0,1].你必须除以 pexp(1, rate=1.5)
.所以这就是我要为重要性采样示例做的事情:
Generally your approach seems to be correct, but you have to be more careful with the domain over which you want to integrate. In your original example, about 20% of values rexp(1000, 1.5)
are above 1. The function dexp(x, rate=1.5)
is not a density function on the interval [0,1]. You have to divide by pexp(1, rate=1.5)
. So here is what I would do for the importance sampling example:
#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dexp(x,rate=1.5) * pexp(1, rate=1.5)
f <- function(x) x^(-0.5)
X <- rexp(1000,rate=1.5)
X <- X[X<=1]
Y <- w(X)*f(X)
c(mean(Y), var(Y))
在你的第二个例子中,同样的事情导致了问题.您得到负 X,因此得到 log(X) 的 NA 值.此外,您的正常函数应以 0.5 为中心,方差较小.这是我的方法:
In your second example the same thing causes the problem. You get negative X and therefore get NA values for log(X). Furthermore, your normal function should be centered at 0.5 with less variance. Here's my approach:
#Without Importance Sampling
set.seed(1909)
X <- runif(1000,0.01,1)
Y <- (1+sinh(2*X)*log(X))^(-1)
c(mean(Y), var(Y))
#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dnorm(x, mean=0.5, sd=0.25) * (1-2*pnorm(0, mean=0.5, sd=0.25))
f <- function(x) (1+sinh(2*x)*log(x))^(-1)
X <- rnorm(1000, mean=0.5, sd=0.25)
Y1 <- w(X)
Y2 <- f(X)
Y <- Y1*Y2
Y <- Y[!(is.na(Y2)&Y1==0)]
c(mean(Y), var(Y))
在你的第二个例子中,我不太明白 y
是什么.它只是一个常数吗?那么也许威布尔分布可能会起作用.
In your second example, I don't really understand what y
is. Is it just a constant? Then perhaps a Weibull distribution may work.
关于您在评论中的其他问题.(1) 任何概率密度函数都应该积分到1.因此dexp(x, rate=1.5)
不是区间[0,1]上的密度函数,它只积分到pexp(1, rate=1.5)
.但是,函数
Regarding your additional questions in the comments.(1) Any probability density function should integrate to 1. Therefore dexp(x, rate=1.5)
is not a density function on the interval [0,1], it only integrates to pexp(1, rate=1.5)
. However, the function
dexp01 <- function(x, rate){
dexp(x, rate=rate)/pexp(1, rate=rate)
}
实际上集成为 1:
integrate(dexp, 0, 1, rate=1.5)
integrate(dexp01, 0, 1, rate=1.5)
这就是包含概率分布函数的基本原理.如果您有不同的间隔,例如[0.3,8],你必须相应地调整函数:
That's the rationale of including the probability distribution function. If you have a different interval, e.g. [0.3,8], you have to adjust the function accordingly:
dexp0.3_8 <- function(x, rate){
dexp(x, rate=rate)/(pexp(8, rate=rate)-pexp(0.3, rate=rate))
}
integrate(dexp0.3_8, 0.3, 8, rate=1.5)
(2) 这里我选择了方差,使得 rnorm(1000, .5, .25)
中大约 95% 的值在 0 到 1 的区间内(有很多值在这个区间肯定会增加方差).但是,我不确定这是分布函数的最佳选择.重要性函数的选择是一个我不太熟悉的问题.您可以询问 CrossValidated.你的下一个问题也是如此.
(2) Here I choose the variance so that approximately 95% of the values in rnorm(1000, .5, .25)
were in the interval from 0 to 1 (having many values outside this interval would certainly increase the variance). However, I am not certain that this is the best choice of distribution function. The selection of the importance function is a problem that I am not very familiar with. You could ask on CrossValidated. Same goes for your next question.
这篇关于R:使用重要性采样的蒙特卡罗集成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!