问题描述
我有一个 Age
和 Marital_Status
的数据框.age
是 int
并且 Marital_Status
是 string
具有 8 个唯一字符串,例如:已婚、单身等.
划分为区间10的年龄组后
df['age_group'] = pd.cut(df.Age,bins=[0,10,20,30,40,50,60,70,80,90])
我把婚姻状况按年龄分组
df.groupby(['age_group'])['Marital_Status'].value_counts().
错误如下返回
ValueError: 操作数无法与形状 (9,) (7,) 一起广播
当我尝试使用较小数量的 age_group 其中形状 groupby 工作正常.例如
df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
我假设第一个形状 (x,)
必须小于第二个形状 (y,)
.谁能解释一下为什么?
DataFrame.groupby
,默认参数是observed=False
,但你只需要工作使用现有分类:
观察到的布尔值,默认为假
这仅适用于任何石斑鱼是分类鱼的情况.如果为 True:仅显示分类石斑鱼的观察值.如果为 False:显示分类石斑鱼的所有值.
示例:
df = pd.DataFrame({'Marital_Status':['stat1'] * 50,'年龄':范围(50)})df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])# 打印 (df)打印(df.groupby(['age_group'],观察=真)['Marital_Status'].value_counts())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9名称:Marital_Status,数据类型:int64
在替代解决方案中最好是检查差异:
print (df.groupby(['age_group', 'Marital_Status']).size())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9(50, 60] stat1 0(60, 70] stat1 0(70, 80] stat1 0数据类型:int64打印 (df.groupby(['age_group', 'Marital_Status'], 观察=真).size())age_group Marital_Status(10, 20] stat1 10(20, 30] stat1 10(30, 40] stat1 10(40, 50] 统计 1 9数据类型:int64
I have a dataframe of Age
and Marital_Status
. The age
is int
and Marital_Status
is string
with 8 unique string eg: Married, Single etc.
After dividing into age group of interval 10
df['age_group'] = pd.cut(df.Age,bins=[0,10,20,30,40,50,60,70,80,90])
I had group the marital status by age group
df.groupby(['age_group'])['Marital_Status'].value_counts().
With error as below return
ValueError: operands could not be broadcast together with shape (9,) (7,)
When i tried with smaller number of age_group where the shape <= (7,) the groupby
works fine. eg
df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
I presume that the first shape (x,)
must be smaller than second shape (y,)
. Can anyone please explain why?
Here is problem (maybe bug?) in DataFrame.groupby
, default parameter is observed=False
, but you need working only with existing categoricals:
Sample:
df = pd.DataFrame({'Marital_Status':['stat1'] * 50,
'Age': range(50)})
df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
# print (df)
print (df.groupby(['age_group'], observed=True)['Marital_Status'].value_counts())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
Name: Marital_Status, dtype: int64
In alternative solution better is possible check difference:
print (df.groupby(['age_group', 'Marital_Status']).size())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
(50, 60] stat1 0
(60, 70] stat1 0
(70, 80] stat1 0
dtype: int64
print (df.groupby(['age_group', 'Marital_Status'], observed=True).size())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
dtype: int64
这篇关于Pandas:PD 切割后 Groupby 返回错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!