本文介绍了如何在 sklearn 的 PCA 之后保留行标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数组:

sampleA 1 2 2 1 
sampleB 1 3 2 1
sampleC 2 3 1 2

我的目标是在样本中运行 PCA 并查看它们的聚类.但是,我需要在行标题中保留样本名称.有什么办法可以做到这一点吗?所需的 PCA 结果包括行标题:

My goal is to run PCA across the samples and see their clustering. However, I need to preserve the sample names in the row header. Is there any way I can do this? Desired PCA result includes the row headers:

sampleA 0.13 0.1
sampleB 0.1 0.4
sampleC 0.1 0.1

目前只运行这两个简单的行:

Currently just running these two simple lines:

my_pca = PCA(n_components=8)
trans = my_pca.fit_transform(in_array)

推荐答案

根据 source,在进行PCA之前,您的输入将被np.array()转换.因此,即使您使用结构化数组或 Pandas DataFrame,您也会在 PCA.fit_transform(X) 期间丢失行索引.但是,数据的顺序会被保留,这意味着您可以根据需要附加索引:

According to the source, you input will be transformed by np.array() before doing PCA. So you will lose the row index during PCA.fit_transform(X) even you use a structured array or a pandas DataFrame. However, the order of your data is preserved, meaning you can attach the index back if you want:

import io

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA

s = """sampleA 1 2 2 1
sampleB 1 3 2 1
sampleC 2 3 1 2"""
in_array = pd.read_table(io.StringIO(s), sep=' ', header=None, index_col=0)
my_pca = PCA(n_components=2)
trans = my_pca.fit_transform(in_array)
df = pd.DataFrame(trans, index=in_array.index)
print(df)
#                 0         1
# 0                          
# sampleA -0.773866 -0.422976
# sampleB -0.424531  0.514022
# sampleC  1.198397 -0.091046

这篇关于如何在 sklearn 的 PCA 之后保留行标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:37