python - Pandas 虫？ : Mean of an grouped-by int64 column stays as int64 in some circumstances

我发现一种非常奇怪的(IMHO)行为，其中一些数据已从CSV文件加载到 Pandas 中。为了保护无辜者，让我们声明DataFrame在变量homes中，并且除其他外，以下各列:

In [143]: homes[['zipcode', 'sqft', 'price']].dtypes
Out[143]:
zipcode     int64
sqft        int64
price       int64
dtype: object

为了获得每个邮政编码的平均价格，我尝试了以下操作:

In [146]: homes.groupby('zipcode')[['price']].mean().head(n=5)
Out[146]:
           price
zipcode
28001     280804
28002     234284
28003     294111
28004    1355927
28005     810164

奇怪的是，价格均值是一个int64，如下所示:

In [147]: homes.groupby('zipcode')[['price']].mean().dtypes
Out[147]:
price    int64
dtype: object

我无法想象任何技术原因为何某些整数的均值不会提升为浮点型。甚至更多，仅添加另一列，价格便会 float 。
正如我期望的那样:

In [148]: homes.groupby('zipcode')[['price', 'sqft']].mean().dtypes
Out[148]:
price       float64
sqft        float64
dtype: object

                  price          sqft
zipcode
28001     280804.690608  14937.450276
28002     234284.035176   7517.633166
28003     294111.278571  10603.096429
28004    1355927.097792  13104.220820
28005     810164.880952  19928.785714

为了确保我不会遗漏一些非常明显的东西，我创建了另一个非常简单的DataFrame(df)，但是在这种情况下，这种现象并未出现:

In [161]: df[['J','K']].dtypes
Out[161]:
J    int64
K    int64
dtype: object

In [164]: df[['J','K']].head(n=10)
Out[164]:
   J   K
0  0  -9
1  0 -14
2  0   8
3  0 -11
4  0  -7
5 -1   7
6  0   2
7  0   0
8  0   5
9  0   3

In [165]: df.groupby('J')[['K']].mean()
Out[165]:
           K
J
-2 -2.333333
-1  0.466667
 0 -1.030303
 1 -1.750000
 2 -3.000000

请注意，对于由J分组的另一个列K:int64，另一个int64，均值直接是浮点数。从读取的homes DataFrame提供的CSV文件，即已在 Pandas 中创建了df，并将其写入CSV，然后回读。

最后但并非最不重要的一点是，我使用的是 Pandas 0.16.2。

最佳答案

正如某些人在评论中所建议的那样，这是 Pandas 中的错误。我刚刚报告了here。

截至目前，它已被 Pandas 团队接受。

谢谢

关于python - Pandas 虫？ : Mean of an grouped-by int64 column stays as int64 in some circumstances，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/32809182/

Int64

python - Pandas 虫？ : Mean of an grouped-by int64 column stays as int64 in some circumstances