理解高斯混合模型的概念

本文介绍了理解高斯混合模型的概念的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图通过阅读在线资源来了解 GMM.我已经使用 K-Means 实现了聚类，并看到了 GMM 与 K-means 的比较.

以下是我的理解，如果我的概念有误，请告诉我:

GMM 就像 KNN，在这两种情况下都实现了聚类.但是在 GMM 中，每个集群都有自己独立的均值和协方差.此外，k-means 将数据点硬分配给集群，而在 GMM 中，我们得到一组独立的高斯分布，并且对于每个数据点，我们都有它属于其中一个分布的概率.

为了更好地理解它，我使用 MatLab 对其进行编码并实现所需的聚类.我已经使用 SIFT 特征来提取特征.并使用 k-means 聚类来初始化值.(这是来自的想法是每个分布代表一个聚类.因此，在上面的一维数据示例中，如果给定一个实例 x = 0.5，我们将以 99.5% 的概率将其分配为属于第一个集群/模式

>>x = 0.5;>>后(gmm，x)答案 =0.9950 0.0050 % 概率 x 来自每个组件

您可以看到实例如何很好地落在第一条钟形曲线之下.而如果你在中间取一个点，答案会更模棱两可(分配给 class=2 但确定性要低得多):

>>x = 2.2>>后(gmm，2.2)答案 =0.4717 0.5283

相同的概念通过.>

现在，当您使用 GMM 执行聚类时，目标是找到模型参数(平均值和协方差)每个分布以及先验)，以便生成的模型最适合数据.最佳拟合估计转化为最大化数据的可能性给定 GMM 模型(意味着您选择最大化Pr(data|model)).

正如其他人所解释的，这是使用EM算法迭代解决的；EM 从对混合模型参数的初始估计或猜测开始.它根据参数产生的混合密度迭代地对数据实例重新评分.然后使用重新评分的实例来更新参数估计.如此重复直到算法收敛.

不幸的是，EM算法对模型的初始化非常敏感，所以如果初始值设置不好，可能需要很长时间才能收敛，甚至卡在局部最优.初始化 GMM 参数的更好方法是使用 K-means 作为第一步(如您所示在您的代码中)，并使用这些集群的均值/cov 来初始化 EM.

与其他聚类分析技术一样，我们首先需要决定要使用的聚类数量.交叉验证是一种可靠的方法，可以很好地估计聚类数量.>

EM 聚类的缺点是有很多参数需要拟合，通常需要大量数据和多次迭代才能获得良好的结果.具有 M 混合和 D 维数据的无约束模型涉及拟合 D*D*M + D*M + M 参数(M 个协方差矩阵，每个 DxD 大小，加上 M 个长度为 D 的平均向量，加上一个长度为 M 的先验向量).对于具有大量维度的数据集来说，这可能是一个问题.所以习惯上施加限制和假设来简化问题(一种正则化以避免过度拟合问题).例如，您可以将协方差矩阵修复为仅对角线，甚至可以让协方差矩阵在所有高斯中共享.

最后，一旦您拟合了混合模型，您就可以通过使用每个混合组件计算数据实例的后验概率来探索集群(就像我在 1D 示例中展示的那样).GMM 根据这个成员"可能性将每个实例分配到一个集群.

这是使用高斯混合模型聚类数据的更完整示例:

% 加载 Fisher Iris 数据集加载fisheriris% 为了可视化，将其投影到 2 维[~,data] = pca(meas,'NumComponents',2);mn = min(数据)；mx = 最大值(数据)；D = 大小(数据，2)；% 数据维度% 用于初始化 EM 的初始 kmeans 步骤K = 3;% 混合物/簇数cInd = kmeans(data, K, 'EmptyAction','singleton');% 拟合 GMM 模型gmm = fitgmdist(data, K, 'Options',statset('MaxIter',1000), ...'CovType','full', 'SharedCov',false, 'Regularize',0.01, 'Start',cInd);% 均值、协方差和混合权重mu = gmm.mu;西格玛 = gmm.西格玛；p = gmm.PComponents;每个实例的 % 聚类和后验概率% 注意:[~,clustIdx] = max(p,[],2)[clustInd,~,p] = 集群(gmm，数据)；制表(clustInd)% 绘图数据、整个域的聚类和 GMM 轮廓clrLite = [1 0.6 0.6 ;0.6 1 0.6 ;0.6 0.6 1]；clrDark = [0.7 0 0 ;0 0.7 0 ;0 0 0.7];[X,Y] = meshgrid(linspace(mn(1),mx(1),50), linspace(mn(2),mx(2),50));C = cluster(gmm, [X(:) Y(:)]);图像(X(:), Y(:), reshape(C,size(X))), 等等gscatter(数据(:，1)，数据(:，2)，物种，clrDark)h = ezcontour(@(x,y)pdf(gmm,[x y]), [mn(1) mx(1) mn(2) mx(2)]);set(h, 'LineColor','k', 'LineStyle',':')推迟，xy 轴，颜色图(clrLite)title('2D 数据和拟合 GMM'), xlabel('PC1'), ylabel('PC2')

I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means.

Here is what I have understood, please let me know if my concept is wrong:

GMM is like KNN, in the sense that clustering is achieved in both cases. But in GMM each cluster has their own independent mean and covariance. Furthermore k-means performs hard assignments of data points to clusters whereas in GMM we get a collection of independant gaussian distributions, and for each data point we have a probability that it belongs to one of the distributions.

To understand it better I have used MatLab to code it and achieve the desired clustering. I have used SIFT features for the purpose of feature extraction. And have used k-means clustering to initialize the values. (This is from the VLFeat documentation)

%images is a 459 x 1 cell array where each cell contains the training image
[locations, all_feats] = vl_dsift(single(images{1}), 'fast', 'step', 50); %all_feats will be 128 x no. of keypoints detected
for i=2:(size(images,1))
    [locations, feats] = vl_dsift(single(images{i}), 'fast', 'step', 50);
    all_feats = cat(2, all_feats, feats); %cat column wise all features
end

numClusters = 50; %Just a random selection.
% Run KMeans to pre-cluster the data
[initMeans, assignments] = vl_kmeans(single(all_feats), numClusters, ...
    'Algorithm','Lloyd', ...
    'MaxNumIterations',5);

initMeans = double(initMeans); %GMM needs it to be double

% Find the initial means, covariances and priors
for i=1:numClusters
    data_k = all_feats(:,assignments==i);
    initPriors(i) = size(data_k,2) / numClusters;

    if size(data_k,1) == 0 || size(data_k,2) == 0
        initCovariances(:,i) = diag(cov(data'));
    else
        initCovariances(:,i) = double(diag(cov(double((data_k')))));
    end
end

% Run EM starting from the given parameters
[means,covariances,priors,ll,posteriors] = vl_gmm(double(all_feats), numClusters, ...
    'initialization','custom', ...
    'InitMeans',initMeans, ...
    'InitCovariances',initCovariances, ...
    'InitPriors',initPriors);

Based on the above I have means, covariances and priors. My main question is, What now? I am kind of lost now.

Also the means, covariances vectors are each of the size 128 x 50. I was expecting them to be 1 x 50 since each column is a cluster, wont each cluster have only one mean and covariance? (I know 128 are the SIFT features but I was expecting means and covariances).

In k-means I used the the MatLab command knnsearch(X,Y) which basically finds the nearest neighbour in X for each point in Y.

So how to achieve this in GMM, I know its a collection of probabilities, and ofcourse the nearest match from that probability will be our winning cluster. And this is where I am confused.All tutorials online have taught how to achieve the means, covariances values, but do not say much in how to actually use them in terms of clustering.

Thank you

解决方案

I think it would help if you first look at what a GMM model represents. I'll be using functions from the Statistics Toolbox, but you should be able to do the same using VLFeat.

Let's start with the case of a mixture of two 1-dimensional normal distributions. Each Gaussian is represented by a pair of mean and variance. The mixture assign a weight to each component (prior).

For example, lets mix two normal distributions with equal weights (p = [0.5; 0.5]), the first centered at 0 and the second at 5 (mu = [0; 5]), and the variances equal 1 and 2 respectively for the first and second distributions (sigma = cat(3, 1, 2)).

As you can see below, the mean effectively shifts the distribution, while the variance determines how wide/narrow and flat/pointy it is. The prior sets the mixing proportions to get the final combined model.

% create GMM
mu = [0; 5];
sigma = cat(3, 1, 2);
p = [0.5; 0.5];
gmm = gmdistribution(mu, sigma, p);

% view PDF
ezplot(@(x) pdf(gmm,x));

The idea of EM clustering is that each distribution represents a cluster. So in the example above with one dimensional data, if you were given an instance x = 0.5, we would assign it as belonging to the first cluster/mode with 99.5% probability

>> x = 0.5;
>> posterior(gmm, x)
ans =
    0.9950    0.0050    % probability x came from each component

you can see how the instance falls well under the first bell-curve. Whereas if you take a point in the middle, the answer would be more ambiguous (point assigned to class=2 but with much less certainty):

>> x = 2.2
>> posterior(gmm, 2.2)
ans =
    0.4717    0.5283

The same concepts extend to higher dimension with multivariate normal distributions. In more than one dimension, the covariance matrix is a generalization of variance, in order to account for inter-dependencies between features.

Here is an example again with a mixture of two MVN distributions in 2-dimensions:

% first distribution is centered at (0,0), second at (-1,3)
mu = [0 0; 3 3];

% covariance of first is identity matrix, second diagonal
sigma = cat(3, eye(2), [5 0; 0 1]);

% again I'm using equal priors
p = [0.5; 0.5];

% build GMM
gmm = gmdistribution(mu, sigma, p);

% 2D projection
ezcontourf(@(x,y) pdf(gmm,[x y]));

% view PDF surface
ezsurfc(@(x,y) pdf(gmm,[x y]));

There is some intuition behind how the the covariance matrix affects the shape of the joint density function. For instance in 2D, if the matrix is diagonal it implies that the two dimensions don't co-vary. In that case the PDF would look like an axis-aligned ellipse stretched out either horizontally or vertically according to which dimension has the bigger variance. If they are equal, then the shape is a perfect circle (distribution spread out in both dimensions at an equal rate). Finally if the covariance matrix is arbitrary (non-diagonal but still symmetric by definition), then it will probably look like a stretched ellipse rotated at some angle.

So in the previous figure, you should be able to tell the two "bumps" apart and what individual distribution each represent. When you go 3D and higher dimensions, think of the it as representing (hyper-)ellipsoids in N-dims.

Now when you're performing clustering using GMM, the goal is to find the model parameters (mean and covariance of each distribution as well as the priors) so that the resulting model best fits the data. The best-fit estimation translates into maximizing the likelihood of the data given the GMM model (meaning you choose model that maximizes Pr(data|model)).

As other have explained, this is solved iteratively using the EM algorithm; EM starts with an initial estimate or guess of the parameters of the mixture model. It iteratively re-scores the data instances against the mixture density produced by the parameters. The re-scored instances are then used to update the parameter estimates. This is repeated until the algorithm converges.

Unfortunately the EM algorithm is very sensitive to the initialization of the model, so it might take a long time to converge if you set poor initial values, or even get stuck in local optima. A better way to initial the GMM parameters is to use K-means as a first step (like you've shown in your code), and using the mean/cov of those clusters to initialize EM.

As with other cluster analysis techniques, we first need to decide on the number of clusters to use. Cross-validation is a robust way to find a good estimate of the number of clusters.

EM clustering suffers from the fact that there a lot parameters to fit, and usually requires lots of data and many iterations to get good results. An unconstrained model with M-mixtures and D-dimensional data involves fitting D*D*M + D*M + M parameters (M covariance matrices each of size DxD, plus M mean vectors of length D, plus a vector of priors of length M). That could be a problem for datasets with large number of dimensions. So it is customary to impose restrictions and assumption to simplify the problem (a sort of regularization to avoid overfitting problems). For instance you could fix the covariance matrix to be only diagonal or even have the covariance matrices shared across all Gaussians.

Finally once you've fitted the mixture model, you can explore the clusters by computing the posterior probability of data instances using each mixture component (like I've showed with the 1D example). GMM assigns each instance to a cluster according to this "membership" likelihood.

Here is a more complete example of clustering data using Gaussian mixture models:

% load Fisher Iris dataset
load fisheriris

% project it down to 2 dimensions for the sake of visualization
[~,data] = pca(meas,'NumComponents',2);
mn = min(data); mx = max(data);
D = size(data,2);    % data dimension

% inital kmeans step used to initialize EM
K = 3;               % number of mixtures/clusters
cInd = kmeans(data, K, 'EmptyAction','singleton');

% fit a GMM model
gmm = fitgmdist(data, K, 'Options',statset('MaxIter',1000), ...
    'CovType','full', 'SharedCov',false, 'Regularize',0.01, 'Start',cInd);

% means, covariances, and mixing-weights
mu = gmm.mu;
sigma = gmm.Sigma;
p = gmm.PComponents;

% cluster and posterior probablity of each instance
% note that: [~,clustIdx] = max(p,[],2)
[clustInd,~,p] = cluster(gmm, data);
tabulate(clustInd)

% plot data, clustering of the entire domain, and the GMM contours
clrLite = [1 0.6 0.6 ; 0.6 1 0.6 ; 0.6 0.6 1];
clrDark = [0.7 0 0 ; 0 0.7 0 ; 0 0 0.7];
[X,Y] = meshgrid(linspace(mn(1),mx(1),50), linspace(mn(2),mx(2),50));
C = cluster(gmm, [X(:) Y(:)]);
image(X(:), Y(:), reshape(C,size(X))), hold on
gscatter(data(:,1), data(:,2), species, clrDark)
h = ezcontour(@(x,y)pdf(gmm,[x y]), [mn(1) mx(1) mn(2) mx(2)]);
set(h, 'LineColor','k', 'LineStyle',':')
hold off, axis xy, colormap(clrLite)
title('2D data and fitted GMM'), xlabel('PC1'), ylabel('PC2')

这篇关于理解高斯混合模型的概念的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！