问题描述
为什么Pandas告诉我我有对象,尽管所选列中的每个项目都是一个字符串,即使经过显式转换也是如此.
Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion.
这是我的数据框:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id 56992 non-null values
attr1 56992 non-null values
attr2 56992 non-null values
attr3 56992 non-null values
attr4 56992 non-null values
attr5 56992 non-null values
attr6 56992 non-null values
dtypes: int64(2), object(5)
其中有五个是dtype object
.我将这些对象明确转换为字符串:
Five of them are dtype object
. I explicitly convert those objects to strings:
for c in df.columns:
if df[c].dtype == object:
print "convert ", df[c].name, " to string"
df[c] = df[c].astype(str)
然后,df["attr2"]
仍然具有dtype object
,尽管type(df["attr2"].ix[0]
显示str
,这是正确的.
Then, df["attr2"]
still has dtype object
, although type(df["attr2"].ix[0]
reveals str
, which is correct.
熊猫区分int64
和float64
以及object
.没有dtype str
时,其背后的逻辑是什么?为什么str
被object
覆盖?
Pandas distinguishes between int64
and float64
and object
. What is the logic behind it when there is no dtype str
? Why is a str
covered by object
?
推荐答案
dtype对象来自NumPy,它描述ndarray中元素的类型. ndarray中的每个元素都必须具有相同的字节大小.对于int64和float64,它们是8个字节.但是对于字符串,字符串的长度不是固定的.因此,熊猫不是直接将字符串的字节保存在ndarray中,而是使用对象ndarray来保存指向对象的指针,因此,这种ndarray的dtype是object.
The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in a ndarray must has the same size in byte. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of save the bytes of strings in the ndarray directly, Pandas use object ndarray, which save pointers to objects, because of this the dtype of this kind ndarray is object.
这里是一个例子:
- int64数组包含4个int64值.
- 对象数组包含指向3个字符串对象的4个指针.
这篇关于DataFrame中的字符串,但dtype是object的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!