做的事情完全一样code> 但保留原始索引并且不会崩溃."考虑以下事项.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[1,1,0,0,1,0,0,0,0,1]})让我们找出那些在 cat 列中具有非零条目的 id.>>>df.groupby('id')['cat'].apply(lambda x: (x == 1).any())ID1 真2 真3 错误4 真名称:cat,数据类型:bool太好了.但是,如果我们想创建一个指标列,我们可以执行以下操作.>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64我不明白为什么 dtype 现在是 int64 而不是 any() 函数返回的布尔值.当我将原始数据框更改为包含一些布尔值(请注意零仍然存在)时,转换方法会在 object 列中返回布尔值.这对我来说是一个额外的谜,因为所有值都是布尔值,但它被列为 object 显然是为了匹配整数和布尔值的原始混合类型列的 dtype.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫': [真,真,0,0,真,0,0,0,0,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:猫,数据类型:对象但是,当我使用所有布尔值时,转换函数返回一个布尔列.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[真,真,假,假,真,假,假,假,假,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:cat,数据类型:bool使用我敏锐的模式识别技能,结果列的 dtype 似乎反映了原始列的dtype.我很感激关于为什么会发生这种情况或 transform 函数中发生了什么的任何提示.干杯. 解决方案 看起来 SeriesGroupBy.transform() 试图将结果 dtype 转换为与原始列相同的数据类型,但是 DataFrameGroupBy.transform() 似乎没有这样做:在[139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())出[139]:0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64# v v在 [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())出[140]:猫0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真在 [141]: df.dtypes出[141]:猫 int64id int64数据类型:对象I don't understand why apply and transform return different dtypes when called on the same data frame. The way I explained the two functions to myself before went something along the lines of "apply collapses the data, and transform does exactly the same thing as apply but preserves the original index and doesn't collapse." Consider the following.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 'cat': [1,1,0,0,1,0,0,0,0,1]})Let's identify those ids which have a nonzero entry in the cat column.>>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any())id1 True2 True3 False4 TrueName: cat, dtype: boolGreat. If we wanted to create an indicator column, however, we could do the following.>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 11 12 13 14 15 16 17 08 09 1Name: cat, dtype: int64I don't understand why the dtype is now int64 instead of the boolean returned by the any() function.When I change the original data frame to contain some booleans (note that the zeros remain), the transform approach returns booleans in an object column. This is an extra mystery to me since all of the values are boolean, but it's listed as object apparently to match the dtype of the original mixed-type column of integers and booleans.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 'cat': [True,True,0,0,True,0,0,0,0,True]})>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 True1 True2 True3 True4 True5 True6 True7 False8 False9 TrueName: cat, dtype: objectHowever, when I use all booleans, the transform function returns a boolean column.df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 'cat': [True,True,False,False,True,False,False,False,False,True]})>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 True1 True2 True3 True4 True5 True6 True7 False8 False9 TrueName: cat, dtype: boolUsing my acute pattern-recognition skills, it appears that the dtype of the resulting column mirrors that of the original column. I would appreciate any hints about why this occurs or what's going on under the hood in the transform function. Cheers. 解决方案 It looks like SeriesGroupBy.transform() tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform() doesn't seem to do that:In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())Out[139]:0 11 12 13 14 15 16 17 08 09 1Name: cat, dtype: int64# v vIn [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())Out[140]: cat0 True1 True2 True3 True4 True5 True6 True7 False8 False9 TrueIn [141]: df.dtypesOut[141]:cat int64id int64dtype: object 这篇关于 pandas 变换()与应用()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-16 19:06
查看更多