pandas 以指定的(附加)列作为枢轴

本文介绍了 pandas 以指定的(附加)列作为枢轴的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个不同的元组列表，它们被转换为熊猫数据框:

I have a two different list of tuples that are converted into a pandas dataframe:

ls1 = [(1,"A",2),(1,"B",1),(1,"C",3),(2,"A",4),(2,"B",4,),(2,"C",5)]
ls2 = [(1,"A",2),(1,"C",3),(1,"B",1),(1,"D",6),(2,"A",4),(2,"C",5),(2,"B",4,)]
df1 = pandas.DataFrame(ls1, columns=['ID', 'Class', 'count'])
df2 = pandas.DataFrame(ls2, columns=['ID', 'Class', 'count'])

现在，我想从两个数据框中创建一个数据透视表，其列名称分别为"A"，"B"，"C"，"D"类.因此，所有四个列名称(如果可能，也可以按指定的顺序)应该存在于结果数据透视表中.如果没有ID-Class组合的计数，则应使用例如来填充. NaN.

Now I'd like to make a pivot table from both dataframes with the column names for the classes "A", "B", "C", "D". So, all four column names (if possible also in a specified order) should exist in the resulting pivot table. If there are no counts for a ID-Class combination these should be filled with e.g. NaN.

dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
dfpivot2 = df2.pivot(index='ID', columns='Class', values='count')

>>> dfpivot1
Class  A  B  C
ID
1      2  1  3
2      4  4  5
>>>

使用熊猫的.pivot提供df1的数据透视表，但是只有三个Class列("A"，"B"，"C").因此，需要修改dfpivot1使其具有列"A"，"B"，"C"和"D"，从而与dfpivot2的列完全匹配.由于我是从元组列表开始的，其他方法(不使用pandas)可能也会很有趣.

Using the .pivot from pandas provides a pivot table for df1, but with only three Class-columns ("A","B","C"). So, dfpivot1 needs to be modified to have the columns "A","B","C" AND "D" and thus, exactly match the columns of dfpivot2. As I am starting from a list of tuples also other approaches (without using pandas) might be interesting.

推荐答案

我认为您需要添加 reindex_axis 或 reindex 填充所有缺少的列的NaN:

I think you need add reindex_axis or reindex for fill NaNs for all missing columns:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex_axis(cols, axis=1)
print (dfpivot1)
Class  A  B  C   D
ID
1      2  1  3 NaN
2      4  4  5 NaN

也可以指定fill_value参数:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
              .reindex_axis(cols, fill_value=0, axis=1)
print (dfpivot1)
Class  A  B  C  D
ID
1      2  1  3  0
2      4  4  5  0

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex(columns=cols)
print (dfpivot1)
Class  A  B  C   D
ID
1      2  1  3 NaN
2      4  4  5 NaN

这篇关于 pandas 以指定的(附加)列作为枢轴的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！