本文介绍了大数上的scipy.integrate.quad精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试通过scipy.integrate.quad()计算这样的积分(实际上是pdf的指数分布的cdf):

I try to compute such an integral (actually cdf of exponential distribution with its pdf) via scipy.integrate.quad():

import numpy as np
from scipy.integrate import quad

def g(x):
    return .5 * np.exp(-.5 * x)

print quad(g, a=0., b=np.inf)
print quad(g, a=0., b=10**6)
print quad(g, a=0., b=10**5)
print quad(g, a=0., b=10**4)

结果如下:

(1.0, 3.5807346295637055e-11)
(0.0, 0.0)
(3.881683817604194e-22, 7.717972744764185e-22)
(1.0, 1.6059202674761255e-14)

尽管使用np.inf可以解决问题,但所有尝试使用较大的积分上限的尝试都会产生错误的答案.

All the attempts to use a big upper integration limit yield an incorrect answer though the usage of np.inf solves the problem.

在GitHub上的 scipy问题#5428中讨论了类似情况.

Similiar case is discussed in scipy issue #5428 at GitHub.

在集成其他密度函数时应如何避免这种错误?

What should I do to avoid such an error in integrating other density functions?

推荐答案

我相信问题是由于np.exp(-x)x的增加而迅速变得很小,由于有限的数值精度而导致评估为零.例如,即使对于x小到x=10**2*的情况,np.exp(-x)也会计算为3.72007597602e-44,而10**3或更高阶的x值也会导致0.

I believe the issue is due to np.exp(-x) quickly becoming very small as x increases, which results in evaluating as zero due to limited numerical precision. For example, even for x as small as x=10**2*, np.exp(-x) evaluates to 3.72007597602e-44, whereas x values of order 10**3 or above result in 0.

我不知道quad的实现细节,但是它可能会对给定的集成范围内要集成的功能执行某种采样.对于较大的积分上限,np.exp(-x)的大多数样本都评估为零,因此积分值被低估了. (请注意,在这些情况下,quad提供的绝对误差与积分值的阶次相同,这表明后者不可靠.)

I do not know the implementation specifics of quad, but it probably performs some kind of sampling of the function to be integrated over the given integration range. For a large upper integration limit, most of the samples of np.exp(-x) evaluate to zero, hence the integral value is underestimated. (Note that in these cases the provided absolute error by quad is of the same order as the integral value which is an indicator that the latter is unreliable.)

避免此问题的一种方法是将积分上限限制为一个数值,高于该数值数值函数将变得非常小(因此,对积分值的贡献很小).从代码片段来看,10**4的值似乎是一个不错的选择,但是10**2的值也可以导致对积分的准确评估.

One approach to avoid this issue is to restrict the integration upper bound to a value above which the numerical function becomes very small (and, hence, contributes marginally to the integral value). From your code snipet, the value of 10**4 appears to be a good choice, however, a value of 10**2 also results in an accurate evaluation of the integral.

避免数值精度问题的另一种方法是使用以任意精度算术执行计算的模块,例如mpmath.例如,对于x=10**5mpmath如下计算exp(-x)(使用本机mpmath指数函数)

Another approach to avoid numerical precision issues is to use a module that performs computation in arbitrary precision arithmetic, such as mpmath. For example, for x=10**5, mpmath evaluates exp(-x) as follows (using the native mpmath exponential function)

import mpmath as mp
print(mp.exp(-10**5))

请注意此值有多小.使用标准硬件数值精度(由使用),该值变为0.

Note how small this value is. With the standard hardware numerical precision (used by numpy) this value becomes 0.

mpmath提供了一个积分函数(mp.quad),它可以为整数上限的任意值提供准确的积分估计.

mpmath offers an integration function (mp.quad), which can provide an accurate estimate of the integral for arbitrary values of the upper integral bound.

import mpmath as mp

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))
1.0
0.999999650469474
0.999999999996516
0.999999999999997

我们还可以通过将精度提高到小数点后的精度(例如15是标准精度)来获得更准确的估算值

We can also obtain even more accurate estimates by increasing the precision to, say, 50 decimal points (from 15 which is the standard precision)

mp.mp.dps = 50;

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))
1.0
0.99999999999999999999999999999999999999999829880262
0.99999999999999999999999999999999999999999999997463
0.99999999999999999999999999999999999999999999999998

通常,获得此精度的成本是增加的计算时间.

In general, the cost for obtaining this accuracy is an increased computation time.

P.S .:不用说,如果您能够首先以分析方式评估积分(例如,在Sympy的帮助下),您会忘记所有上述情况.

P.S.: It goes without saying that if you are able to evaluate your integral analytically in the first place (e.g., with the help of Sympy) you can forget all the above.

这篇关于大数上的scipy.integrate.quad精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!