本文介绍了 pandas 占 groupby 总数的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这显然很简单,但作为一个麻木的新手,我被卡住了.

This is obviously simple, but as a numpy newbe I'm getting stuck.

我有一个 CSV 文件,其中包含 3 列、州、办公室 ID 和该办公室的销售额.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office.

我想计算给定州每个办公室的销售额百分比(每个州的所有百分比总和为 100%).

I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%).

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': range(1, 7) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})

df.groupby(['state', 'office_id']).agg({'sales': 'sum'})

返回:

                  sales
state office_id
AZ    2          839507
      4          373917
      6          347225
CA    1          798585
      3          890850
      5          454423
CO    1          819975
      3          202969
      5          614011
WA    2          163942
      4          369858
      6          959285

我似乎无法弄清楚如何达到"到 groupbystate 级别来总计 sales为整个 state 计算分数.

I can't seem to figure out how to "reach up" to the state level of the groupby to total up the sales for the entire state to calculate the fraction.

推荐答案

Paul H 的回答 是正确的,您将拥有创建第二个 groupby 对象,但您可以用更简单的方式计算百分比——只需 groupby state_office 并划分 sales 列按其总和.复制 Paul H 答案的开头:

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

返回:

                     sales
state office_id
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508

这篇关于 pandas 占 groupby 总数的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 13:38