问题描述
这显然很简单,但作为一个麻木的新手,我被卡住了.
This is obviously simple, but as a numpy newbe I'm getting stuck.
我有一个 CSV 文件,其中包含 3 列、州、办公室 ID 和该办公室的销售额.
I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office.
我想计算给定州每个办公室的销售额百分比(每个州的所有百分比总和为 100%).
I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%).
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': range(1, 7) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})
df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
返回:
sales
state office_id
AZ 2 839507
4 373917
6 347225
CA 1 798585
3 890850
5 454423
CO 1 819975
3 202969
5 614011
WA 2 163942
4 369858
6 959285
我似乎无法弄清楚如何达到"到 groupby
的 state
级别来总计 sales
为整个 state
计算分数.
I can't seem to figure out how to "reach up" to the state
level of the groupby
to total up the sales
for the entire state
to calculate the fraction.
推荐答案
Paul H 的回答 是正确的,您将拥有创建第二个 groupby
对象,但您可以用更简单的方式计算百分比——只需 groupby
state_office
并划分 sales
列按其总和.复制 Paul H 答案的开头:
Paul H's answer is right that you will have to make a second groupby
object, but you can calculate the percentage in a simpler way -- just groupby
the state_office
and divide the sales
column by its sum. Copying the beginning of Paul H's answer:
# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
返回:
sales
state office_id
AZ 2 16.981365
4 19.250033
6 63.768601
CA 1 19.331879
3 33.858747
5 46.809373
CO 1 36.851857
3 19.874290
5 43.273852
WA 2 34.707233
4 35.511259
6 29.781508
这篇关于 pandas 占 groupby 总数的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!