MATLAB实例:PCA降维

作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

1. iris数据

5.1,3.5,1.4,0.2,1
4.9,3.0,1.4,0.2,1
4.7,3.2,1.3,0.2,1
4.6,3.1,1.5,0.2,1
5.0,3.6,1.4,0.2,1
5.4,3.9,1.7,0.4,1
4.6,3.4,1.4,0.3,1
5.0,3.4,1.5,0.2,1
4.4,2.9,1.4,0.2,1
4.9,3.1,1.5,0.1,1
5.4,3.7,1.5,0.2,1
4.8,3.4,1.6,0.2,1
4.8,3.0,1.4,0.1,1
4.3,3.0,1.1,0.1,1
5.8,4.0,1.2,0.2,1
5.7,4.4,1.5,0.4,1
5.4,3.9,1.3,0.4,1
5.1,3.5,1.4,0.3,1
5.7,3.8,1.7,0.3,1
5.1,3.8,1.5,0.3,1
5.4,3.4,1.7,0.2,1
5.1,3.7,1.5,0.4,1
4.6,3.6,1.0,0.2,1
5.1,3.3,1.7,0.5,1
4.8,3.4,1.9,0.2,1
5.0,3.0,1.6,0.2,1
5.0,3.4,1.6,0.4,1
5.2,3.5,1.5,0.2,1
5.2,3.4,1.4,0.2,1
4.7,3.2,1.6,0.2,1
4.8,3.1,1.6,0.2,1
5.4,3.4,1.5,0.4,1
5.2,4.1,1.5,0.1,1
5.5,4.2,1.4,0.2,1
4.9,3.1,1.5,0.1,1
5.0,3.2,1.2,0.2,1
5.5,3.5,1.3,0.2,1
4.9,3.1,1.5,0.1,1
4.4,3.0,1.3,0.2,1
5.1,3.4,1.5,0.2,1
5.0,3.5,1.3,0.3,1
4.5,2.3,1.3,0.3,1
4.4,3.2,1.3,0.2,1
5.0,3.5,1.6,0.6,1
5.1,3.8,1.9,0.4,1
4.8,3.0,1.4,0.3,1
5.1,3.8,1.6,0.2,1
4.6,3.2,1.4,0.2,1
5.3,3.7,1.5,0.2,1
5.0,3.3,1.4,0.2,1
7.0,3.2,4.7,1.4,2
6.4,3.2,4.5,1.5,2
6.9,3.1,4.9,1.5,2
5.5,2.3,4.0,1.3,2
6.5,2.8,4.6,1.5,2
5.7,2.8,4.5,1.3,2
6.3,3.3,4.7,1.6,2
4.9,2.4,3.3,1.0,2
6.6,2.9,4.6,1.3,2
5.2,2.7,3.9,1.4,2
5.0,2.0,3.5,1.0,2
5.9,3.0,4.2,1.5,2
6.0,2.2,4.0,1.0,2
6.1,2.9,4.7,1.4,2
5.6,2.9,3.6,1.3,2
6.7,3.1,4.4,1.4,2
5.6,3.0,4.5,1.5,2
5.8,2.7,4.1,1.0,2
6.2,2.2,4.5,1.5,2
5.6,2.5,3.9,1.1,2
5.9,3.2,4.8,1.8,2
6.1,2.8,4.0,1.3,2
6.3,2.5,4.9,1.5,2
6.1,2.8,4.7,1.2,2
6.4,2.9,4.3,1.3,2
6.6,3.0,4.4,1.4,2
6.8,2.8,4.8,1.4,2
6.7,3.0,5.0,1.7,2
6.0,2.9,4.5,1.5,2
5.7,2.6,3.5,1.0,2
5.5,2.4,3.8,1.1,2
5.5,2.4,3.7,1.0,2
5.8,2.7,3.9,1.2,2
6.0,2.7,5.1,1.6,2
5.4,3.0,4.5,1.5,2
6.0,3.4,4.5,1.6,2
6.7,3.1,4.7,1.5,2
6.3,2.3,4.4,1.3,2
5.6,3.0,4.1,1.3,2
5.5,2.5,4.0,1.3,2
5.5,2.6,4.4,1.2,2
6.1,3.0,4.6,1.4,2
5.8,2.6,4.0,1.2,2
5.0,2.3,3.3,1.0,2
5.6,2.7,4.2,1.3,2
5.7,3.0,4.2,1.2,2
5.7,2.9,4.2,1.3,2
6.2,2.9,4.3,1.3,2
5.1,2.5,3.0,1.1,2
5.7,2.8,4.1,1.3,2
6.3,3.3,6.0,2.5,3
5.8,2.7,5.1,1.9,3
7.1,3.0,5.9,2.1,3
6.3,2.9,5.6,1.8,3
6.5,3.0,5.8,2.2,3
7.6,3.0,6.6,2.1,3
4.9,2.5,4.5,1.7,3
7.3,2.9,6.3,1.8,3
6.7,2.5,5.8,1.8,3
7.2,3.6,6.1,2.5,3
6.5,3.2,5.1,2.0,3
6.4,2.7,5.3,1.9,3
6.8,3.0,5.5,2.1,3
5.7,2.5,5.0,2.0,3
5.8,2.8,5.1,2.4,3
6.4,3.2,5.3,2.3,3
6.5,3.0,5.5,1.8,3
7.7,3.8,6.7,2.2,3
7.7,2.6,6.9,2.3,3
6.0,2.2,5.0,1.5,3
6.9,3.2,5.7,2.3,3
5.6,2.8,4.9,2.0,3
7.7,2.8,6.7,2.0,3
6.3,2.7,4.9,1.8,3
6.7,3.3,5.7,2.1,3
7.2,3.2,6.0,1.8,3
6.2,2.8,4.8,1.8,3
6.1,3.0,4.9,1.8,3
6.4,2.8,5.6,2.1,3
7.2,3.0,5.8,1.6,3
7.4,2.8,6.1,1.9,3
7.9,3.8,6.4,2.0,3
6.4,2.8,5.6,2.2,3
6.3,2.8,5.1,1.5,3
6.1,2.6,5.6,1.4,3
7.7,3.0,6.1,2.3,3
6.3,3.4,5.6,2.4,3
6.4,3.1,5.5,1.8,3
6.0,3.0,4.8,1.8,3
6.9,3.1,5.4,2.1,3
6.7,3.1,5.6,2.4,3
6.9,3.1,5.1,2.3,3
5.8,2.7,5.1,1.9,3
6.8,3.2,5.9,2.3,3
6.7,3.3,5.7,2.5,3
6.7,3.0,5.2,2.3,3
6.3,2.5,5.0,1.9,3
6.5,3.0,5.2,2.0,3
6.2,3.4,5.4,2.3,3
5.9,3.0,5.1,1.8,3

2. MATLAB程序

function [COEFF,SCORE,latent,tsquared,explained,mu,data_PCA]=pca_demo()
x=load('iris.data');
[~,d]=size(x);
k=d-1; %前k个主成分
x=zscore(x(:,1:d-1));  %归一化数据
[COEFF,SCORE,latent,tsquared,explained,mu]=pca(x);
% 1)获取样本数据 X ,样本为行,特征为列。
% 2)对样本数据中心化,得S(S = X的各列减去各列的均值)。
% 3)求 S 的协方差矩阵 C = cov(S)
% 4) 对协方差矩阵 C 进行特征分解 [P,Lambda] = eig(C);
% 5)结束。
% 1、输入参数 X 是一个 n 行 p 列的矩阵。每行代表一个样本观察数据,每列则代表一个属性,或特征。
% 2、COEFF 就是所需要的特征向量组成的矩阵,是一个 p 行 p 列的矩阵,没列表示一个出成分向量,经常也称为(协方差矩阵的)特征向量。并且是按照对应特征值降序排列的。所以,如果只需要前 k 个主成分向量,可通过:COEFF(:,1:k) 来获得。
% 3、SCORE 表示原数据在各主成分向量上的投影。但注意:是原数据经过中心化后在主成分向量上的投影。即通过:SCORE = x0*COEFF 求得。其中 x0 是中心平移后的 X(注意:是对维度进行中心平移,而非样本。),因此在重建时,就需要加上这个平均值了。
% 4、latent 是一个列向量,表示特征值,并且按降序排列。
% 5、tsquared Hotelling的每个观测值X的T平方统计量
% 6、explained 由每个主成分解释的总方差的百分比
% 7、mu 每个变量X的估计平均值
data_PCA=x*COEFF(:,1:k);
latent1=100*latent/sum(latent);%将latent总和统一为100,便于观察贡献率
pareto(latent1);%调用matla画图 pareto仅绘制累积分布的前95%,因此y中的部分元素并未显示
xlabel('Principal Component');
ylabel('Variance Explained (%)');
% 图中的线表示的累积变量解释程度
print(gcf,'-dpng','Iris PCA.png');
iris_pac=data_PCA(:,1:2) ;
save iris_pca iris_pac

3. 结果

iris_pca:前两个主成分

-2.25698063306803	0.504015404227653
-2.07945911889541	-0.653216393612590
-2.36004408158421	-0.317413944570283
-2.29650366000389	-0.573446612971233
-2.38080158645275	0.672514410791076
-2.06362347633724	1.51347826673567
-2.43754533573242	0.0743137171331950
-2.22638326740708	0.246787171742162
-2.33413809644009	-1.09148977019584
-2.18136796941948	-0.447131117450110
-2.15626287481026	1.06702095645556
-2.31960685513084	0.158057945820095
-2.21665671559727	-0.706750478104682
-2.63090249246321	-0.935149145374822
-2.18497164997156	1.88366804891533
-2.24394778052703	2.71328133141014
-2.19539570001472	1.50869601039751
-2.18286635818774	0.512587093716441
-1.88775015418968	1.42633236069007
-2.33213619695782	1.15416686250116
-1.90816386828207	0.429027879924458
-2.19728429051438	0.949277150423224
-2.76490709741649	0.487882574439700
-1.81433337754274	0.106394361814184
-2.22077768737273	0.161644638073716
-1.95048968523510	-0.605862870440206
-2.04521166172712	0.265126114804279
-2.16095425532709	0.550173363315497
-2.13315967968331	0.335516397664229
-2.26121491382610	-0.313827252316662
-2.13739396044139	-0.482326258880086
-1.82582143036022	0.443780130732953
-2.59949431958629	1.82237008322707
-2.42981076672382	2.17809479520796
-2.18136796941948	-0.447131117450110
-2.20373717203888	-0.183722323644913
-2.03759040170113	0.682669420156327
-2.18136796941948	-0.447131117450110
-2.42781878392261	-0.879223932713649
-2.16329994558551	0.291749566745466
-2.27889273592867	0.466429134628597
-1.86545776627869	-2.31991965918865
-2.54929404704891	-0.452301129580194
-1.95772074352968	0.495730895348582
-2.12624969840005	1.16752080832811
-2.06842816583668	-0.689607099127106
-2.37330741591874	1.14679073709691
-2.39018434748641	-0.361180775489047
-2.21934619663183	1.02205856145225
-2.19858869176329	0.0321302060908945
1.10030752013391	0.860230593245533
0.730035752246062	0.596636784545418
1.23796221659453	0.612769614333371
0.395980710562889	-1.75229858398514
1.06901265623960	-0.211050862633647
0.383174475987114	-0.589088965722193
0.746215185580377	0.776098608766709
-0.496201068006129	-1.84269556949638
0.923129796737431	0.0302295549588077
0.00495143780650871	-1.02596403732389
-0.124281108093219	-2.64918765259090
0.437265238506424	-0.0586846858581760
0.549792126592992	-1.76666307900171
0.714770518429262	-0.184815166484382
-0.0371339806719297	-0.431350035919633
0.872966018474250	0.508295314415273
0.346844440799832	-0.189985178614466
0.152880381053472	-0.788085297090142
1.21124542423444	-1.62790202112846
0.156417163578196	-1.29875232891050
0.735791135537219	0.401126570248885
0.470792483676532	-0.415217206131680
1.22388807504403	-0.937773165086814
0.627279600231826	-0.415419947028686
0.698133985336190	-0.0632819273014206
0.870620328215835	0.249871517845242
1.25003445866275	-0.0823442389434431
1.35370481019450	0.327722365822153
0.659915359649250	-0.223597000167979
-0.0471236447211597	-1.05368247816741
0.121128417400412	-1.55837168956507
0.0140710866007487	-1.56813894313840
0.235222818975321	-0.773333046281646
1.05316323317206	-0.634774729305402
0.220677797156699	-0.279909968621073
0.430341476713787	0.852281697154445
1.04590946111265	0.520453696157683
1.03241950881290	-1.38781716762055
0.0668436673617666	-0.211910813930204
0.274505447436587	-1.32537578085168
0.271425764670620	-1.11570381243558
0.621089830946741	0.0274506709978046
0.328903506457842	-0.985598883763833
-0.372380114621411	-2.01119457605980
0.281999617970590	-0.851099454545845
0.0887557702224096	-0.174324544331148
0.223607676665854	-0.379214256409087
0.571967341693057	-0.153206717308028
-0.455486948803962	-1.53432438068788
0.251402252309636	-0.593871222060355
1.84150338645482	0.868786147264828
1.14933941416981	-0.698984450845645
2.19898270027627	0.552618780551384
1.43388176486790	-0.0498435417617587
1.86165398830779	0.290220535935809
2.74500070081969	0.785799704159685
0.357177895625210	-1.55488557249365
2.29531637451915	0.408149356863061
1.99505169024551	-0.721448439846371
2.25998344407884	1.91502747107928
1.36134878398531	0.691631011499905
1.59372545693795	-0.426818952656741
1.87796051113409	0.412949339203311
1.24890257443547	-1.16349352357816
1.45917315700813	-0.442664601834978
1.58649439864337	0.674774813132046
1.46636772102851	0.252347085727036
2.42924030093571	2.54822056527013
3.29809226641255	-0.00235343587272177
1.24979406018816	-1.71184899071237
2.03368323142868	0.904369044486726
0.970663302005081	-0.569267277965818
2.88838806680663	0.396463170625287
1.32475563655861	-0.485135293486995
1.69855040646181	1.01076227706927
1.95119099025002	0.999984474306318
1.16799162725452	-0.317831851008113
1.01637609822602	0.0653241212065782
1.78004554289349	-0.192627479858818
1.85855159177699	0.553527164026207
2.42736549094542	0.245830911619345
2.30834922706014	2.61741528404554
1.85415981777379	-0.184055790370030
1.10756129219332	-0.294997832217552
1.19347091639304	-0.814439294423699
2.79159729280499	0.841927657717863
1.57487925633390	1.06889360300461
1.34254676764379	0.420846092290459
0.920349720485088	0.0191661621187343
1.84736314547313	0.670177571688802
2.00942543830962	0.608358978317639
1.89676252747561	0.683734258412757
1.14933941416981	-0.698984450845645
2.03648602144585	0.861797777652503
1.99500750598298	1.04504903502442
1.86427657131500	0.381543630923962
1.55328823048458	-0.902290843047121
1.51576710303099	0.265903772450991
1.37179554779330	1.01296839034343
0.956095566421630	-0.0222095406309480

累计贡献率

可见:前两个主成分已经占了95%的贡献程度。这两个主成分可以近似表示整个数据。

01-20 21:06