本文介绍了按列表顺序对pandas DataFrame进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一个熊猫DataFrame,df,其中的列代表了分类标准(例如,Kingdom,Phylum,Class等).我也有一个分类标签列表,这些标签与我希望DataFrame的顺序相对应排序.

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.

列表看起来像这样:

class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']

此列表将对应于Dataframedf['Class'].我想基于列表的顺序对整个数据框的所有行进行排序,因为df['Class']当前处于不同的顺序.最好的方法是什么?

This list would correspond to the Dataframe column df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class'] is in a different order currently. What would be the best way to do this?

推荐答案

您可以将Class列设为索引列

df = df.set_index('Class')

,然后使用df.locclass_list重新索引DataFrame:

and then use df.loc to reindex the DataFrame with class_list:

df.loc[class_list]

最小示例:

>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
                 Class  Number
0  Gammaproteobacteria       3
1        Bacteroidetes       5
2        Negativicutes       6

>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
                     Number
Bacteroidetes             5
Negativicutes             6
Gammaproteobacteria       3

这篇关于按列表顺序对pandas DataFrame进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 12:09