问题描述
我有一个这样的熊猫数据框:
I have a pandas dataframe like this:
COMMIT_ID | FILE_NAME | COMMITTER | CHANGE TYPE
-------------------------------------------------------------
1 | package.json | A | MODIFY
2 | main.js | B | ADD
2 | class.java | B | DELETE
我希望文件名的行值作为列标题,而changetype作为值.
I want the row values of the file name as column headers and the changetype as the value.
COMMIT_ID | package.json | main.js | class.java | COMMITTER
-----------------------------------------------------------------------------
1 | MODIFY | NONE | NONE | A
2 | NONE | ADD | DELETE | B
我曾尝试过使用pandas.pivot_table
,但不是很成功.有机会轻松做到这一点吗?
I have tried with pandas.pivot_table
but wasn't very successful. Any chance to do this easily?
推荐答案
I think you need set_index
+ unstack
:
df = df.set_index(['COMMIT_ID','COMMITTER','FILE_NAME'])['CHANGE TYPE']
.unstack()
.reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE ADD None
使用 pivot_table
的解决方案-需要聚集函数,例如sum
(无分隔符的连接字符串)或'_'.join
(有分隔符的连接字符串),如果重复:
Solutions with pivot_table
- need aggregate function like sum
(concatenate strings without separator) or '_'.join
(concatenate strings with separator) if duplicates:
print (df)
COMMIT_ID FILE_NAME COMMITTER CHANGE TYPE
0 1 package.json A MODIFY
1 2 main.js B ADD
2 2 class.java B DELETE
3 2 class.java B ADD
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='sum').reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETEADD ADD None
或者:
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='_'.join).reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE_ADD ADD None
使用first
进行汇总也可以,但是您可能会丢失重复值:
Aggregate with first
works also, but you can lost duplicates values:
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='first').reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE ADD None
最后一个重命名列名称添加 rename_axis
:
Last for rename columns names add rename_axis
:
df = df.rename_axis(None, axis=1)
print (df)
COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETEADD ADD None
这篇关于转换Pandas DataFrame,将行值添加为列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!