问题描述
我有一些数据透视代码因错误而失败
I have some pivot code that is failing with the error
pandas.core.base.DataError: 没有要聚合的数字类型
我已将问题追溯到之前调用 pandas.melt
I have tracked down the problem to a previous call to pandas.melt
这是融化前的 dtypes:
Here are the dtypes before the melt:
frame.dtypes
user_id Int64
feature object
seconds_since_start_assigned Int32
total float32
programme_ids object
q1 Int32
q2 Int32
q3 Int32
q4 Int32
q5 Int32
q6 Int32
q7 Int32
q8 Int32
q9 Int32
week Int32
现在开始融化
frame1 = pd.melt(
frame,
id_vars=['user_id', 'week'],
value_vars=['q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9'],
var_name='question',
value_name='score')
frame1.dtypes
user_id object
week object
question object
score object
为什么对 melt
的调用将 score
所需的 Int32
替换为 object
?
Why has the call to melt
replaced the Int32
I need for score
with object
?
推荐答案
您正在使用可为空的整数数据类型('Int32' 中的大写字母 'I').这仍然是一种相当新的数据类型,因此并非所有功能都存在.即有一个建设部分下的重大警告,问题是 Series 无法 推断出可为空的整数 dtype,尽管也许有一天:
You are using the nullable Integer data type (capital 'I' in 'Int32'). This is still a fairly new data type and so not all of the functionality is there. Namely there's a big warning under the Construction section, and the issue is that Series cannot infer a nullable integer dtype, though perhaps someday:
将来,我们可能会为 Series 提供一个选项来推断可空整数 dtype.
我们可以自己看到这一点.Series 不会推断出正确的类型,并且将 object
作为唯一可以容纳可空 Interger 缺失的容器.数组虽然有效.
We can see this ourselves. Series will not infer the correct type and are left with object
as the only container that can hold the nullable Interger missing. Arrays work though.
import pandas as pd
arr = [1, pd._libs.missing.NAType(), 4]
pd.Series(arr)
#0 1
#1 <NA>
#2 4
#dtype: object # <- Did not infer the type :(
pd.array(arr)
#<IntegerArray>
#[1, <NA>, 4]
#Length: 3, dtype: Int64
所以你融化了,得到一个系列,熊猫无法推断 dtype,所以它在融化后被转换为 object
.现在,您必须显式转换回Int32".
So you melt, get a Series and pandas cannot infer the dtype so it gets cast to object
after the melt. For now, you'll have to explicitly convert back to 'Int32'.
这篇关于为什么 pandas.melt 弄乱了我的数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!