聚类和Matlab

本文介绍了聚类和Matlab的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试对来自KDD 1999杯子数据集的一些数据进行聚类

I'm trying to cluster some data I have from the KDD 1999 cup dataset

文件的输出如下:

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

具有该格式的48,000个不同的记录.我已经清理了数据，并删除了仅保留数字的文本.现在的输出看起来像这样:

with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now:

我在excel中创建了一个逗号分隔文件并另存为csv文件，然后从matlab中的csv文件创建了数据源，我尝试通过matlab中的fcm工具箱运行它(findcluster输出38种数据类型，预期38列).

I created a comma delimited file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolbox in matlab (findcluster outputs 38 data types which is expected with 38 columns).

但是，群集看起来不像群集，或者它无法按照我需要的方式工作.

The clusters however don't look like clusters or its not accepting and working the way I need it to.

有人可以帮助找到这些集群吗?对Matlab来说，我是新手，所以没有任何经验，对于集群我也很新.

Could anyone help finding the clusters? Im new to matlab so don't have any experience and I'm also new to clustering.

方法:

选择簇数(K)
初始化质心(从数据集中随机选择K个模式)
将每个模式分配给具有最接近质心的聚类
计算每个聚类的均值以使其成为新质心
重复第3步，直到满足停止条件为止(没有模式移动到另一个集群)

这是我要实现的目标:

这就是我得到的:

load kddcup1.dat
plot(kddcup1(:,1),kddcup1(:,2),'o')
[center,U,objFcn] = fcm(kddcup1,2);
Iteration count = 1, obj. fcn = 253224062681230720.000000
Iteration count = 2, obj. fcn = 241493132059137410.000000
Iteration count = 3, obj. fcn = 241484544542298110.000000
Iteration count = 4, obj. fcn = 241439204971005280.000000
Iteration count = 5, obj. fcn = 241090628742523840.000000
Iteration count = 6, obj. fcn = 239363408546874750.000000
Iteration count = 7, obj. fcn = 238580863900727680.000000
Iteration count = 8, obj. fcn = 238346826370420990.000000
Iteration count = 9, obj. fcn = 237617756429912510.000000
Iteration count = 10, obj. fcn = 226364785036628320.000000
Iteration count = 11, obj. fcn = 94590774984961184.000000
Iteration count = 12, obj. fcn = 2220521449216102.500000
Iteration count = 13, obj. fcn = 2220521273191876.200000
Iteration count = 14, obj. fcn = 2220521273191876.700000
Iteration count = 15, obj. fcn = 2220521273191876.700000

figure
plot(objFcn)
title('Objective Function Values')
xlabel('Iteration Count')
ylabel('Objective Function Value')

    maxU = max(U);
    index1 = find(U(1, :) == maxU);
    index2 = find(U(2, :) == maxU);
    figure
    line(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',...
    'none','marker', 'o','color','g');
    line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',...
    'none','marker', 'x','color','r');
    hold on
    plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
    plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)

findcluster

问题描述

推荐答案