问题描述
请考虑以下示例:
df = pd.DataFrame([[1, "a"], [2, "b"]], columns=["int", "str"])
df.astype({"int":np.int8, "str": np.dtype('|S2')})
arr = df.to_records(index=False)
print(arr.dtype.descr)
我希望看到的是:
[(u'int', '<i8'), (u'str', '|S2')]
相反,我得到了:
[(u'int', '<i8'), (u'str', '|O')]
为什么,'|O'
是什么意思?
why and what does '|O'
mean?
我也尝试了df.astype({"int":np.int8, "str": '|S2'})
,并得到了相同的结果.
I also tried df.astype({"int":np.int8, "str": '|S2'})
, and got the same result.
推荐答案
创建DataFrame时,尽管指定了类型,但字符串的类型为Object
:
When you create your DataFrame, although you specify types, the strings are of type Object
:
df.dtypes
int int64
str object
dtype: object
astype
不是 就地操作,因此您的命令目前不执行任何操作,您需要重新分配:
astype
is not an inplace operation, so your command does nothing at the moment, you need to reassign:
df = df.astype({"int":np.int8, "str": np.dtype('|S2')})
但这仍然不能转换object
中的字符串:
This still does not convert the strings from object
however:
df.dtypes
int int8
str object
dtype: object
因此,当您使用to_records
时,将使用object
代替您指定的类型.
So when you use to_records
, object
is used instead of your designated type.
一种解决方法是分别创建您的字符串系列,并将其分配给您的DataFrame:
A fix would be to create your string series separately, and assign it to your DataFrame:
s = pd.Series(['a', 'b'], dtype=np.dtype('|S2'))
df['d'] = s
df.dtypes
int int8
str object
d |S2
dtype: object
并使用to_records
:
df.to_records(index=False)
rec.array([(1, b'a', b'a'), (2, b'b', b'b')],
dtype=[('int', 'i1'), ('str', 'O'), ('d', 'S2')])
这篇关于 pandas astype无法识别固定长度的字节串格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!