问题描述
我正在尝试使用小计创建一个简单的数据透视表,excel 样式,但是我找不到使用 Pandas 的方法.我已经尝试了 Wes 在另一个与小计相关的问题中建议的解决方案,但这并没有给出预期的结果.下面是重现它的步骤:
I'm trying to create a simple pivot table with subtotals, excel-style, however I can't find a method using Pandas. I've tried the solution Wes suggested in another subtotal-related question, however that doesn't give the expected results. Below the steps to reproduce it:
创建示例数据:
sample_data = {'customer': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'B', 'B', 'B'], 'product': ['astro','ball','car','astro','ball', 'car', 'astro', 'ball', 'car','astro','ball','car'],
'week': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'qty': [10, 15, 20, 40, 20, 34, 300, 20, 304, 23, 45, 23]}
df = pd.DataFrame(sample_data)
创建带有边距的数据透视表(它只有总计,而不是按客户 (A, B) 进行小计)
create the pivot table with margins (it only has total, not subtotal by customer (A, B))
piv = df.pivot_table(index=['customer','product'],columns='week',values='qty',margins=True,aggfunc=np.sum)
week 1 2 All
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
B astro 40 23 63
ball 20 45 65
car 34 23 57
All 139 715 854
然后,我尝试了 Wes Mckiney 在另一个线程中提到的方法,使用堆栈函数:
Then, I tried the method Wes Mckiney mentioned in another thread, using the stack function:
piv2 = df.pivot_table(index='customer',columns=['week','product'],values='qty',margins=True,aggfunc=np.sum)
piv2.stack('product')
结果具有我想要的格式,但带有All"的行是我想要的格式.没有总和:
The result has the format I want, but the rows with the "All" doesn't have the sum:
week 1 2 All
customer product
A NaN NaN 669.0
astro 10.0 300.0 NaN
ball 15.0 20.0 NaN
car 20.0 304.0 NaN
B NaN NaN 185.0
astro 40.0 23.0 NaN
ball 20.0 45.0 NaN
car 34.0 23.0 NaN
All NaN NaN 854.0
astro 50.0 323.0 NaN
ball 35.0 65.0 NaN
car 54.0 327.0 NaN
如何使它像在 Excel 中一样工作,示例如下?所有小计和总计都有效吗?我错过了什么?编辑excel 示例
how to make it work as it would in Excel, sample below? with all the subtotals and totals working? what am I missing? edexcel sample
只是指出,我能够使用客户在每次迭代和连接时使用 For 循环过滤使其工作,但我希望可能有更直接的解决方案,谢谢
just to point, I am able to make it work using For loops filtering by the customer on each iteration and concat later, but I hope there might be a more direct solution thank you
推荐答案
你可以一步完成,但由于按字母排序,你必须对索引名称有策略:
You can do it one step, but you have to be strategic about index name due to alphabetical sorting:
piv = df.pivot_table(index=['customer','product'],
columns='week',
values='qty',
margins=True,
margins_name='Total',
aggfunc=np.sum)
(pd.concat([piv,
piv.query('customer != "Total"')
.sum(level=0)
.assign(product='total')
.set_index('product', append=True)])
.sort_index())
输出:
week 1 2 Total
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
total 45 624 669
B astro 40 23 63
ball 20 45 65
car 34 23 57
total 94 91 185
Total 139 715 854
这篇关于具有多索引的 Pandas 数据透视表小计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!