在Python中分离高斯混合

在Python中分离高斯混合

本文介绍了在Python中分离高斯混合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一些物理实验的结果,可以用直方图表示[i, amount_of(i)].我想这个结果可以通过 4 - 6 个高斯函数的混合来估计.

There is a result of some physical experiment, which can be represented as a histogram [i, amount_of(i)]. I suppose that result can be estimated by a mixture of 4 - 6 Gaussian functions.

Python 中是否有将直方图作为输入并返回混合分布中每个高斯分布的均值和方差的包?

Is there a package in Python which takes a histogram as an input and returns the mean and variance of each Gaussian distribution in the mixture distribution?

原始数据,例如:

推荐答案

这是一个高斯混合,并且可以使用期望最大化方法进行估计(基本上,它在估计它们如何混合在一起的同时).

This is a mixture of gaussians, and can be estimated using an expectation maximization approach (basically, it finds the centers and means of the distribution at the same time as it is estimating how they are mixed together).

这是在 PyMix 包中实现的.下面我生成一个混合法线的示例,并使用 PyMix 为它们拟合混合模型,包括找出您感兴趣的内容,即子群的大小:

This is implemented in the PyMix package. Below I generate an example of a mixture of normals, and use PyMix to fit a mixture model to them, including figuring out what you're interested in, which is the size of subpopulations:

# requires numpy and PyMix (matplotlib is just for making a histogram)
import random
import numpy as np
from matplotlib import pyplot as plt
import mixture

random.seed(010713)  # to make it reproducible

# create a mixture of normals:
#  1000 from N(0, 1)
#  2000 from N(6, 2)
mix = np.concatenate([np.random.normal(0, 1, [1000]),
                      np.random.normal(6, 2, [2000])])

# histogram:
plt.hist(mix, bins=20)
plt.savefig("mixture.pdf")

以上代码所做的就是生成和绘制混合物.它看起来像这样:

All the above code does is generate and plot the mixture. It looks like this:

现在实际使用 PyMix 来计算百分比:

Now to actually use PyMix to figure out what the percentages are:

data = mixture.DataSet()
data.fromArray(mix)

# start them off with something arbitrary (probably based on a guess from the figure)
n1 = mixture.NormalDistribution(-1,1)
n2 = mixture.NormalDistribution(1,1)
m = mixture.MixtureModel(2,[0.5,0.5], [n1,n2])

# perform expectation maximization
m.EM(data, 40, .1)
print m

这个的输出模型是:

G = 2
p = 1
pi =[ 0.33307859  0.66692141]
compFix = [0, 0]
Component 0:
  ProductDist:
  Normal:  [0.0360178848449, 1.03018725918]

Component 1:
  ProductDist:
  Normal:  [5.86848468319, 2.0158608802]

请注意,它非常正确地找到了两个法线(一个 N(0, 1) 和一个 N(6, 2),大约).它还估计了 pi,它是两个分布中每个分布的分数(您在评论中提到了您最感兴趣的部分).我们在第一个分布中有 1000 个,在第二个分布中有 2000 个,它几乎完全正确地划分了:[0.33307859 0.66692141].如果你想直接得到这个值,做m.pi.

Notice it found the two normals quite correctly (one N(0, 1) and one N(6, 2), approximately). It also estimated pi, which is the fraction in each of the two distributions (you mention in the comments that's what you're most interested in). We had 1000 in the first distribution and 2000 in the second distribution, and it gets the division almost exactly right: [ 0.33307859 0.66692141]. If you want to get this value directly, do m.pi.

一些注意事项:

  • 这种方法采用值向量,而不是直方图.将数据转换为一维向量应该很容易(即,将 [(1.4, 2), (2.6, 3)] 转换为 [1.4, 1.4, 2.6, 2.6,2.6])
  • 我们必须提前猜测高斯分布的数量(如果您要求混合 2,它不会计算出混合 4).
  • 我们必须对分布进行一些初步估计.如果您做出哪怕是一点点合理的猜测,它也应该收敛到正确的估计.

这篇关于在Python中分离高斯混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 02:58