问题描述
我有两个不同的元组列表,它们被转换为熊猫数据框:
I have a two different list of tuples that are converted into a pandas dataframe:
ls1 = [(1,"A",2),(1,"B",1),(1,"C",3),(2,"A",4),(2,"B",4,),(2,"C",5)]
ls2 = [(1,"A",2),(1,"C",3),(1,"B",1),(1,"D",6),(2,"A",4),(2,"C",5),(2,"B",4,)]
df1 = pandas.DataFrame(ls1, columns=['ID', 'Class', 'count'])
df2 = pandas.DataFrame(ls2, columns=['ID', 'Class', 'count'])
现在,我想从两个数据框中创建一个数据透视表,其列名称分别为"A","B","C","D"类.因此,所有四个列名称(如果可能,也可以按指定的顺序)应该存在于结果数据透视表中.如果没有ID-Class组合的计数,则应使用例如来填充. NaN.
Now I'd like to make a pivot table from both dataframes with the column names for the classes "A", "B", "C", "D". So, all four column names (if possible also in a specified order) should exist in the resulting pivot table. If there are no counts for a ID-Class combination these should be filled with e.g. NaN.
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
dfpivot2 = df2.pivot(index='ID', columns='Class', values='count')
>>> dfpivot1
Class A B C
ID
1 2 1 3
2 4 4 5
>>>
使用熊猫的.pivot
提供df1
的数据透视表,但是只有三个Class列("A","B","C").因此,需要修改dfpivot1
使其具有列"A","B","C"和"D",从而与dfpivot2
的列完全匹配.由于我是从元组列表开始的,其他方法(不使用pandas
)可能也会很有趣.
Using the .pivot
from pandas provides a pivot table for df1
, but with only three Class-columns ("A","B","C"). So, dfpivot1
needs to be modified to have the columns "A","B","C" AND "D" and thus, exactly match the columns of dfpivot2
. As I am starting from a list of tuples also other approaches (without using pandas
) might be interesting.
推荐答案
我认为您需要添加 reindex_axis
或 reindex
填充所有缺少的列的NaN
:
I think you need add reindex_axis
or reindex
for fill NaN
s for all missing columns:
cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex_axis(cols, axis=1)
print (dfpivot1)
Class A B C D
ID
1 2 1 3 NaN
2 4 4 5 NaN
也可以指定fill_value
参数:
cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
.reindex_axis(cols, fill_value=0, axis=1)
print (dfpivot1)
Class A B C D
ID
1 2 1 3 0
2 4 4 5 0
cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex(columns=cols)
print (dfpivot1)
Class A B C D
ID
1 2 1 3 NaN
2 4 4 5 NaN
这篇关于 pandas 以指定的(附加)列作为枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!