周期数据直方图的统计

周期数据直方图的统计

本文介绍了周期数据直方图的统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于(-pi,pi)范围内的一系列角度值,我制作了直方图.有没有一种有效的方法来计算均值和模态(后概率)值?请考虑以下示例:

For a series of angle values in (-pi, pi) range, I make a histogram. Is there an effective way to calculate a mean and modal (post probable) value? Consider following examples:

import numpy as N, cmath
deg = N.pi/180.
d = N.array([-175., 170, 175, 179, -179])*deg
i = N.sum(N.exp(1j*d))
ave = cmath.phase(i)
i /= float(d.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))

print ave/deg, stdev/deg

现在,让我们看一个直方图:

Now, let's have a histogram:

counts, bins = N.histogram(data, N.linspace(-N.pi, N.pi, 360))

是否可以计算具有计数和箱位的均值,众数?对于非周期性数据,均值的计算非常简单:

Is it possible to calculate mean, mode having counts and bins? For non-periodic data, calculation of a mean is straightforward:

ave = sum(counts*bins[:-1])

模态值的计算需要更多的精力.实际上,我不确定下面的代码是否正确:首先,我确定最常出现的垃圾箱,然后计算算术平均值:

Calculations of a modal value requires more effort. Actually, I'm not sure my code below is correct: firstly, I identify bins which occur most frequently and then I calculate an arithmetic mean:

cmax = bins[N.argmax(counts)]
mode = N.mean(N.take(bins, N.nonzero(counts == cmax)[0]))

我不知道如何从这些数据计算标准差.解决所有问题(至少是上述问题)的一个显而易见的方法是将直方图数据转换为数据序列,然后在计算中使用它.但是,这并不优雅,而且效率低下.

I have no idea, how to calculate standard deviation from such data, though. One obvious solution to all my problems (at least those described above) is to convert histogram data to a data series and then use it in calculations. This is not elegant, however, and inefficient.

任何提示将不胜感激.

这是我写的部分解决方案.

This is the partial solution I wrote.

import numpy as N, cmath
import scipy.stats as ST

d = [-175, 170.2, 175.57, 179, -179, 170.2, 175.57, 170.2]
deg = N.pi/180.
data = N.array(d)*deg

i = N.sum(N.exp(1j*data))
ave = cmath.phase(i)  # correct and exact mean for periodic data
wrong_ave = N.mean(d)

i /= float(data.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
wrong_stdev = N.std(d)

bins = N.linspace(-N.pi, N.pi, 360)
counts, bins = N.histogram(data, bins, normed=False)
# consider it weighted vector addition
nz = N.nonzero(counts)[0]
weight = counts[nz]
i = N.sum(weight * N.exp(1j*bins[nz])/len(nz))
pave = cmath.phase(i)  # correct and approximated mean for periodic data
i /= sum(weight)/float(len(nz))
pstdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
print
print 'scipy: %12.3f (mean) %12.3f (stdev)' % (ST.circmean(data)/deg, \
                                               ST.circstd(data)/deg)

运行时,它会产生以下结果:

When run, it gives following results:

 mean:      175.840       85.843      175.360
stdev:        0.472      151.785        0.430

scipy:      175.840 (mean)        3.673 (stdev)

现在有一些评论:第一列给出了均值/标准差.可以看出,均值与scipy.stats.circmean非常吻合(感谢JoeKington指出了这一点).不幸的是,stdev与众不同.我待会再看.第二列给出了完全错误的结果(来自numpy的非定期均值/标准差在这里显然不起作用).第三列给出了我想从直方图数据中获取的信息(@JoeKington:我的原始数据不适合我的计算机内存..,@dmytro:感谢您的输入:应用程序我没有太多选择,即我必须以某种方式减少数据).可以看出,均值(第3列)已正确计算,stdev需要进一步注意:)

A few comments now: the first column gives mean/stdev calculated. As can be seen, the mean agrees well with scipy.stats.circmean (thanks JoeKington for pointing it out). Unfortunately stdev differs. I will look at it later. The second column gives completely wrong results (non-periodic mean/std from numpy obviously does not work here). The 3rd column gives sth I wanted to obtain from the histogram data (@JoeKington: my raw data won't fit memory of my computer.., @dmytro: thanks for your input: of course, bin size will influence result but in my application I don't have much choice, i.e. I have to reduce data somehow). As can be seen, the mean (3rd column) is properly calculated, stdev needs further attention :)

推荐答案

看看scipy.stats.circmeanscipy.stats.circstd.

或者您是否只有直方图计数,而没有原始"数据?如果是这样,您可以将冯·米塞斯分布拟合为直方图计数,并近似计算平均值和标准差这样.

Or do you only have the histogram counts, and not the "raw" data? If so, you could fit a Von Mises distribution to your histogram counts and approximate the mean and stddev in that way.

这篇关于周期数据直方图的统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 19:36