问题描述
在Python中,我有一个类似于以下内容的pandas DataFrame:
In Python, I have a pandas DataFrame similar to the following:
Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45 | 50 | 53 | Clothes
TV | 200 | 300 | 250 | Technology
Book | 20 | 17 | 21 | Books
phone| 300 | 350 | 400 | Technology
其中shop1,shop2和shop3是不同商店中每个项目的成本.现在,经过一些数据清理后,我需要返回一个DataFrame,如下所示:
Where shop1, shop2 and shop3 are the costs of every item in different shops.Now, I need to return a DataFrame, after some data cleaning, like this one:
Category (index)| size| sum| mean | std
----------------------------------------
其中,大小是每个类别中的项目数,总和,均值和标准差与应用于3家商店的相同功能相关.我该如何使用split-apply-combine模式(groupby,aggregate,apply ...)进行这些操作?
where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?
有人可以帮我吗?我要疯了……谢谢!
Can someone help me out? I'm going crazy with this one...thank you!
推荐答案
针对Pandas 0.22+版本进行了编辑,考虑到不建议通过聚合来使用组中的字典.
Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.
我们建立了一个非常相似的字典,在其中我们使用字典的键来指定我们的函数,并使用字典本身来重命名列.
We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.
rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
.agg(rnm_cols.keys()).rename(columns=rnm_cols)
Size Sum Mean Std
Category
Books 3 58 19.333333 2.081666
Clothes 3 148 49.333333 4.041452
Technology 6 1800 300.000000 70.710678
选项1
使用 agg
←链接到文档
option 1
use agg
← link to docs
agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)
Std Sum Mean Size
Category
Books 2.081666 58 19.333333 3
Clothes 4.041452 148 49.333333 3
Technology 70.710678 1800 300.000000 6
选项2
花更少的钱
使用 describe
←链接到文档
option 2
more for less
use describe
← link to docs
df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()
count mean std min 25% 50% 75% max
Category
Books 3.0 19.333333 2.081666 17.0 18.5 20.0 20.5 21.0
Clothes 3.0 49.333333 4.041452 45.0 47.5 50.0 51.5 53.0
Technology 6.0 300.000000 70.710678 200.0 262.5 300.0 337.5 400.0
这篇关于Python Pandas Groupby聚合在多个列上,然后进行透视的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!