以下代码始终生成带有空箱的直方图,即使样本数量很大也是如此。空垃圾箱似乎具有规则的间距,但与其他普通垃圾箱的宽度相同。这显然是错误的-为什么会这样?
似乎rvs方法是非随机的,或者历史分箱过程受到了限制。另外,尝试将垃圾箱的数量更改为50,然后出现另一个怪异现象。在这种情况下,似乎其他所有垃圾箱都有虚假的高计数。

""" An example of how to plot histograms using matplotlib
This example samples from a Poisson distribution, plots the histogram
and overlays the Gaussian with the same mean and standard deviation

"""

from scipy.stats import poisson
from scipy.stats import norm
from matplotlib import pyplot as plt
#import matplotlib.mlab as mlab

EV = 100   # the expected value of the distribution
bins = 100 # number of bins in our histogram
n = 10000
RV = poisson(EV)  # Define a Poisson-distributed random variable

samples = RV.rvs(n)  # create a list of n random variates drawn from that random variable

events, edges, patches = plt.hist(samples, bins, normed = True, histtype = 'stepfilled')  # make a histogram

print events  # When I run this, some bins are empty, even when the number of samples is large

# the pyplot.hist method returns a tuple containing three items. These are events, a list containing
# the counts for each bin, edges, a list containing the values of the lower edge of each bin
# the final element of edges is the value of the high edge of the final bin
# patches, I'm not quite sure about, but we don't need at any rate
# note that we really only need the edges list, but we need to unpack all three elements of the tuple
# for things to work properly, so events and patches here are really just dummy variables

mean = RV.mean()  # If we didn't know these values already, the mean and std methods are convenience
sd = RV.std()     # methods that allow us to retrieve the mean and standard deviation for any random variable

print "Mean is:", mean, " SD is: ", sd

#print edges

Y = norm.pdf(edges, mean, sd)  # this is how to do it with the sciPy version of a normal PDF
# edges is a list, so this will return a list Y with normal pdf values corresponding to each element of edges

binwidth = (len(edges)) / (max(edges) - min(edges))
Y = Y * binwidth
print "Binwidth is:", 1/binwidth
# The above is a fix to "de-normalize" the normal distribution to properly reflect the bin widths

#Q = [edges[i+1] - edges[i] for i in range(len(edges)-1)]
#print Q  # This was to confirm that the bins are equally sized, which seems to be the case.

plt.plot(edges, Y)
plt.show()

最佳答案

当您的输入数据仅采用整数值(Poisson RV就是这种情况)并且您的箱数超过此间隔时,则将出现空箱数。如果是这种情况,您将拥有永远不会捕获样本的垃圾箱和一些能够捕获多个间隔样本的垃圾箱。更改箱数和范围以捕获整数间隔,间隙就会消失。

plt.hist(samples,
         range=(0,samples.max()),
         bins=samples.max()+1,
         normed = True, histtype = 'stepfilled')

关于python - 在matplotlib直方图中进行分箱是否存在错误?还是scipy.stats中rvs方法的非随机性,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21408843/

10-11 19:36