问题描述
因此,我有一个熊猫DataFrame,df,其中的列代表了分类标准(例如,Kingdom,Phylum,Class等).我也有一个分类标签列表,这些标签与我希望DataFrame的顺序相对应排序.
So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.
列表看起来像这样:
class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']
此列表将对应于Dataframe
列df['Class']
.我想基于列表的顺序对整个数据框的所有行进行排序,因为df['Class']
当前处于不同的顺序.最好的方法是什么?
This list would correspond to the Dataframe
column df['Class']
. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class']
is in a different order currently. What would be the best way to do this?
推荐答案
您可以将Class
列设为索引列
df = df.set_index('Class')
,然后使用df.loc
用class_list
重新索引DataFrame:
and then use df.loc
to reindex the DataFrame with class_list
:
df.loc[class_list]
最小示例:
>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3
这篇关于按列表顺序对pandas DataFrame进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!