本文介绍了scipy isf 中的意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 scipy 的 stats 模块来尝试确定上尾概率达到某个小值时的分布值,但我得到了一些非常不切实际的结果.例如:

我将 beta 分布拟合到信号匹配操作的归一化相关系数平方数组中(相关系数始终介于 -1 和 1 之间,因此其平方介于 0 和 1 之间).使用

import scipy, numpy as npbd=scipy.beta.fit(np.square(data),floc=0,fscale=1) #fitted beta dist

这给了我的 beta 分布参数(0.42119596435034012, 16939.046996018118, 0, 1) 数据数组大约有 300 万个元素.

现在,当我绘制分布时,很明显分布的大部分区域在 x 轴上非常接近 0

将 matplotlib.pyplot 导入为 pltx=x=np.linspace(0,1,num=1000000)plt.plot(x,scipy.stats.beta.pdf(x,betaparams[0],betaparams[1]))plt.xlim([0,.0001])

现在,当我试图找到仍然存在一些上尾概率的 x 值时,我得到了一些意想不到的行为.例如

for expon in [-1,-2,-3,-4,-5,-6,-7,-8,-9,-10]:打印 (expon,scipy.stats.beta.isf(10**expon,betaparams[0],betaparams[1]))

产量:

(-1, 6.9580465891063448e-05)(-2, 0.00018124328968143608)(-3, 0.00030250611696189104)(-4, 0.00042796070123291116)(-5, 0.0005557482540313166)(-6, 0.00068501413697673774)(-7, 0.99999966996999767)(-8, 0.99999996699699967)(-9, 0.99999999669970008)(-10, 0.99999999966997)

显然 scipy 在 10**-7 左右返回了糟糕的估计.我的问题是为什么,为什么它会默默地表达这种行为,以及如何解决它.

谢谢

解决方案

这似乎是 scipy.special.btdtri 中的一个错误,它应该计算 beta 分布的分位数.也许您可以提交错误报告.

>>>从 scipy 进口特别>>>special.btdtri (betaparams[0],betaparams[1], 1-1e-6)0.00068501413697504238>>>special.btdtri (betaparams[0],betaparams[1], 1-1e-7)0.99999966996999767

我不知道在哪里定义了 btdtri.

作为记录,这里是 SciPy 错误报告:https://github.com/scipy/scipy/issues/4677

I am using scipy's stats module to try and determine values of a distribution at which the upper tail probability reaches some small value, but I am getting some very unrealistic results. For example:

I fit a beta distribution to an array of the square of normalized correlation coefficients for a signal matching operation (correlation coefficient is always between -1 and 1 so its square is between 0 and 1). Using

import scipy, numpy as np
bd=scipy.beta.fit(np.square(data),floc=0,fscale=1) #fitted beta dist

which gives me the beta distribution parameters of (0.42119596435034012, 16939.046996018118, 0, 1) the data array is about 3 million elements long.

Now when I plot the distribution it is clear that most the area of the distribution is very near 0 on the x axis

import matplotlib.pyplot as plt
x=x=np.linspace(0,1,num=1000000)
plt.plot(x,scipy.stats.beta.pdf(x,betaparams[0],betaparams[1]))
plt.xlim([0,.0001])

Now when I try to find the x value for which some upper tail probability remains I get some unexpected behavior. For example

for expon in [-1,-2,-3,-4,-5,-6,-7,-8,-9,-10]:
    print (expon,scipy.stats.beta.isf(10**expon,betaparams[0],betaparams[1]))

yeilds:

(-1, 6.9580465891063448e-05)
(-2, 0.00018124328968143608)
(-3, 0.00030250611696189104)
(-4, 0.00042796070123291116)
(-5, 0.0005557482540313166)
(-6, 0.00068501413697673774)
(-7, 0.99999966996999767)
(-8, 0.99999996699699967)
(-9, 0.99999999669970008)
(-10, 0.99999999966997)

Clearly scipy is returning poor estimates around 10**-7. My question is why, why it would express this behavior silently, and how to fix it.

Thanks

解决方案

This appears to be a bug in scipy.special.btdtri which is supposed to compute quantiles for the beta distribution. Maybe you can file a bug report.

>>> from scipy import special
>>> special.btdtri (betaparams[0],betaparams[1], 1-1e-6)
0.00068501413697504238
>>> special.btdtri (betaparams[0],betaparams[1], 1-1e-7)
0.99999966996999767

I can't figure out where btdtri is defined.

EDIT: For the record, here is the SciPy bug report: https://github.com/scipy/scipy/issues/4677

这篇关于scipy isf 中的意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-16 01:24