本文介绍了为什么 pandas.melt 弄乱了我的数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据透视代码因错误而失败

I have some pivot code that is failing with the error

pandas.core.base.DataError: 没有要聚合的数字类型

我已将问题追溯到之前调用 pandas.melt

I have tracked down the problem to a previous call to pandas.melt

这是融化前的 dtypes:

Here are the dtypes before the melt:

frame.dtypes
user_id                           Int64
feature                          object
seconds_since_start_assigned      Int32
total                           float32
programme_ids                    object
q1                                Int32
q2                                Int32
q3                                Int32
q4                                Int32
q5                                Int32
q6                                Int32
q7                                Int32
q8                                Int32
q9                                Int32
week                              Int32

现在开始融化

frame1 = pd.melt(
     frame,
     id_vars=['user_id', 'week'],
     value_vars=['q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9'],
     var_name='question',
     value_name='score')
frame1.dtypes
user_id     object
week        object
question    object
score       object

为什么对 melt 的调用将 score 所需的 Int32 替换为 object?

Why has the call to melt replaced the Int32 I need for score with object?

推荐答案

您正在使用可为空的整数数据类型('Int32' 中的大写字母 'I').这仍然是一种相当新的数据类型,因此并非所有功能都存在.即有一个建设部分下的重大警告,问题是 Series 无法 推断出可为空的整数 dtype,尽管也许有一天:

You are using the nullable Integer data type (capital 'I' in 'Int32'). This is still a fairly new data type and so not all of the functionality is there. Namely there's a big warning under the Construction section, and the issue is that Series cannot infer a nullable integer dtype, though perhaps someday:

将来,我们可能会为 Series 提供一个选项来推断可空整数 dtype.

我们可以自己看到这一点.Series 不会推断出正确的类型,并且将 object 作为唯一可以容纳可空 Interger 缺失的容器.数组虽然有效.

We can see this ourselves. Series will not infer the correct type and are left with object as the only container that can hold the nullable Interger missing. Arrays work though.

import pandas as pd
arr = [1, pd._libs.missing.NAType(), 4]

pd.Series(arr)
#0       1
#1    <NA>
#2       4
#dtype: object   #  <- Did not infer the type :(

pd.array(arr)
#<IntegerArray>
#[1, <NA>, 4]
#Length: 3, dtype: Int64

所以你融化了,得到一个系列,熊猫无法推断 dtype,所以它在融化后被转换为 object.现在,您必须显式转换回Int32".

So you melt, get a Series and pandas cannot infer the dtype so it gets cast to object after the melt. For now, you'll have to explicitly convert back to 'Int32'.

这篇关于为什么 pandas.melt 弄乱了我的数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-15 21:01