Pandas:PD 切割后 Groupby 返回错误

本文介绍了Pandas:PD 切割后 Groupby 返回错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 Age 和 Marital_Status 的数据框.age 是 int 并且 Marital_Status 是 string 具有 8 个唯一字符串，例如:已婚、单身等.

划分为区间10的年龄组后

df['age_group'] = pd.cut(df.Age,bins=[0,10,20,30,40,50,60,70,80,90])

我把婚姻状况按年龄分组

df.groupby(['age_group'])['Marital_Status'].value_counts().

错误如下返回

ValueError: 操作数无法与形状 (9,) (7,) 一起广播

当我尝试使用较小数量的 age_group 其中形状 groupby 工作正常.例如

df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])

我假设第一个形状 (x,) 必须小于第二个形状 (y,).谁能解释一下为什么?

解决方案

DataFrame.groupby，默认参数是observed=False，但你只需要工作使用现有分类:

观察到的布尔值，默认为假

这仅适用于任何石斑鱼是分类鱼的情况.如果为 True:仅显示分类石斑鱼的观察值.如果为 False:显示分类石斑鱼的所有值.

示例:

df = pd.DataFrame({'Marital_Status':['stat1'] * 50,'年龄':范围(50)})df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])# 打印 (df)打印(df.groupby(['age_group']，观察=真)['Marital_Status'].value_counts())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9名称:Marital_Status，数据类型:int64

在替代解决方案中最好是检查差异:

print (df.groupby(['age_group', 'Marital_Status']).size())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9(50, 60] stat1 0(60, 70] stat1 0(70, 80] stat1 0数据类型:int64打印 (df.groupby(['age_group', 'Marital_Status'], 观察=真).size())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9数据类型:int64

I have a dataframe of Age and Marital_Status. The age is int and Marital_Status is string with 8 unique string eg: Married, Single etc.

After dividing into age group of interval 10

df['age_group'] = pd.cut(df.Age,bins=[0,10,20,30,40,50,60,70,80,90])

I had group the marital status by age group

df.groupby(['age_group'])['Marital_Status'].value_counts().

With error as below return

ValueError: operands could not be broadcast together with shape (9,) (7,)

When i tried with smaller number of age_group where the shape <= (7,) the groupby works fine. eg

df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])

I presume that the first shape (x,) must be smaller than second shape (y,). Can anyone please explain why?

解决方案

Here is problem (maybe bug?) in DataFrame.groupby, default parameter is observed=False, but you need working only with existing categoricals:

Sample:

df = pd.DataFrame({'Marital_Status':['stat1'] * 50,
                   'Age': range(50)})

df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
# print (df)

print (df.groupby(['age_group'], observed=True)['Marital_Status'].value_counts())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
Name: Marital_Status, dtype: int64

In alternative solution better is possible check difference:

print (df.groupby(['age_group', 'Marital_Status']).size())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
(50, 60]   stat1              0
(60, 70]   stat1              0
(70, 80]   stat1              0
dtype: int64

print (df.groupby(['age_group', 'Marital_Status'], observed=True).size())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
dtype: int64

这篇关于Pandas:PD 切割后 Groupby 返回错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！