问题描述
我一直在阅读有关自组织图的知识,我理解算法(我认为),但是仍然有些事情使我难以理解.
I have been doing reading about Self Organizing Maps, and I understand the Algorithm(I think), however something still eludes me.
您如何解释受过训练的网络?
How do you interpret the trained network?
然后您将如何实际使用它来进行分类任务(一旦完成训练数据的聚类)?
How would you then actually use it for say, a classification task(once you have done the clustering with your training data)?
我似乎发现的所有材料(印刷版和数字版)都集中在算法的训练上.我相信我可能会错过一些关键的东西.
All of the material I seem to find(printed and digital) focuses on the training of the Algorithm. I believe I may be missing something crucial.
致谢
推荐答案
SOM
主要是降维算法,而不是分类工具.它们用于降维,就像PCA
和类似方法一样(经过训练,您可以检查输入激活了哪个神经元,并使用该神经元的位置作为值),唯一的实际区别是它们保留神经元的能力.给定输出表示的拓扑.
SOM
s are mainly a dimensionality reduction algorithm, not a classification tool. They are used for the dimensionality reduction just like PCA
and similar methods (as once trained, you can check which neuron is activated by your input and use this neuron's position as the value), the only actual difference is their ability to preserve a given topology of output representation.
所以SOM
实际产生的是从输入空间X
到缩小空间Y
的映射(最常见的是2d晶格,使Y
成为二维空间).要执行实际分类,您应该通过此映射转换数据,然后运行其他分类模型(SVM
,神经网络,决策树等).
So what is SOM
actually producing is a mapping from your input space X
to the reduced space Y
(the most common is a 2d lattice, making Y
a 2 dimensional space). To perform actual classification you should transform your data through this mapping, and run some other, classificational model (SVM
, Neural Network, Decision Tree, etc.).
换句话说-SOM
用于查找数据的其他表示形式.表示形式,很容易被人进一步分析(因为它主要是二维的并且可以绘制),并且对于任何其他分类模型来说都非常容易.这是一种可视化高维数据,分析正在发生的事情",将某些类进行几何分组等的好方法.但是,不应将它们与其他神经模型(例如人工神经网络或增长的神经气体)混淆.是一个非常相似的概念,但由于它们具有不同的用途,因此可以直接进行数据聚类.
In other words - SOM
s are used for finding other representation of the data. Representation, which is easy for further analyzis by humans (as it is mostly 2dimensional and can be plotted), and very easy for any further classification models. This is a great method of visualizing highly dimensional data, analyzing "what is going on", how are some classes grouped geometricaly, etc.. But they should not be confused with other neural models like artificial neural networks or even growing neural gas (which is a very similar concept, yet giving a direct data clustering) as they serve a different purpose.
当然可以直接使用SOM
进行分类,但这是对原始思想的修改,它需要其他数据表示形式,并且通常,与顶部使用其他一些分类器相比,它的效果不佳
Of course one can use SOM
s directly for the classification, but this is a modification of the original idea, which requires other data representation, and in general, it does not work that well as using some other classifier on top of it.
编辑
至少有几种方法可以使受过训练的SOM
可视化:
There are at least few ways of visualizing the trained SOM
:
- 一个人可以将
SOM
的神经元渲染为输入空间中的点,其边缘连接拓扑紧密的神经元(仅当输入空间的维数较少(例如2-3)时,这才可能) - 在
SOM
的拓扑上显示数据类-如果您的数据标有一些数字{1,..k}
,我们可以将某些k
颜色绑定到它们,对于二进制情况,让我们考虑blue
和.接下来,对于每个数据点,我们在SOM
中计算其对应的神经元,并将该标签的颜色添加到神经元中.处理完所有数据后,我们绘制SOM
的神经元,每个神经元具有其在拓扑中的原始位置,颜色是为其分配的颜色的集合(例如均值).如果我们使用一些简单的拓扑(例如2d网格),则这种方法可以为我们提供很好的低维数据表示形式.在下面的图像中,从第三个图像到最后一个图像是这种可视化的结果,其中red
颜色表示标签1("yes" answer) and
bluemeans label
2`(否"答案) - onc还可以通过计算每个连接的神经元的距离并将其绘制在
SOM
的地图上(在上面的可视化中为第二个子图像)来可视化神经元间的距离 - 可以使用某种聚类算法(例如K均值)对神经元的位置进行聚类,并将聚类ID可视化为颜色(第一个子图像)
- one can render the
SOM
's neurons as points in the input space, with edges connecting the topologicaly close ones (this is possible only if the input space has small number of dimensions, like 2-3) - display data classes on the
SOM
's topology - if your data is labeled with some numbers{1,..k}
, we can bind somek
colors to them, for binary case let us considerblue
andred
. Next, for each data point we calculate its corresponding neuron in theSOM
and add this label's color to the neuron. Once all data have been processed, we plot theSOM
's neurons, each with its original position in the topology, with the color being some agregate (eg. mean) of colors assigned to it. This approach, if we use some simple topology like 2d grid, gives us a nice low-dimensional representation of data. In the following image, subimages from the third one to the end are the results of such visualization, wherered
color means label 1("yes" answer) and
bluemeans label
2` ("no" answer) - onc can also visualize the inter-neuron distances by calculating how far away are each connected neurons and plotting it on the
SOM
's map (second subimage in the above visualization) - one can cluster the neuron's positions with some clustering algorithm (like K-means) and visualize the clusters ids as colors (first subimage)
这篇关于解释自组织图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!