平均值取决于与第二个变量有关的装仓

本文介绍了平均值取决于与第二个变量有关的装仓的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用python/numpy.作为输入数据，我具有大量的值对(x,y).我基本上想绘制<y>(x)，即某个数据仓x的y平均值.目前，我使用普通的for循环来实现此目标，这非常慢.

I am working with python / numpy. As input data I have a large number of value pairs (x,y). I basically want to plot <y>(x), i.e., the mean value of y for a certain data bin x. At the moment I use a plain for loop to achieve this, which is terribly slow.

# create example data
x = numpy.random.rand(1000)
y = numpy.random.rand(1000)
# set resolution
xbins = 100
# find x bins
H, xedges, yedges = numpy.histogram2d(x, y, bins=(xbins,xbins) )
# calculate mean and std of y for each x bin
mean = numpy.zeros(xbins)
std = numpy.zeros(xbins)
for i in numpy.arange(xbins):
    mean[i] = numpy.mean(y[ numpy.logical_and( x>=xedges[i], x<xedges[i+1] ) ])
    std[i]  = numpy.std (y[ numpy.logical_and( x>=xedges[i], x<xedges[i+1] ) ])

是否可以为其进行矢量化写作?

Is it possible to have a kind of vectorized writing for it?

推荐答案

您不必要地使事情复杂化.您需要知道的是，对于x中的每个bin，什么是n，sy和sy2，该x bin中的y值的数量，这些y的总和值及其平方和.您可以通过以下方式获得它们:

You are complicating things unnecessarily. All you need to know is, for every bin in x, what are n, sy and sy2, the number of y values in that x bin, the sum of those y values, and the sum of their squares. You can get those as:

>>> n, _ = np.histogram(x, bins=xbins)
>>> sy, _ = np.histogram(x, bins=xbins, weights=y)
>>> sy2, _ = np.histogram(x, bins=xbins, weights=y*y)

从那些:

>>> mean = sy / n
>>> std = np.sqrt(sy2/n - mean*mean)

这篇关于平均值取决于与第二个变量有关的装仓的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！