问题描述
我使用 numpy 的 .astype()
方法来转换数据类型,但是,它给出了奇怪的结果,假设以下代码:
I am using numpy's .astype()
method to convert the data types, however, it gives the strange result, Suppose the following code:
import pandas as pd
import numpy as np
import sys
df = pd.DataFrame([[0.1, 2, 'a']], columns=["a1", "a2", "str"])
arr = df.to_records(index=False)
dtype1 = [('a1', np.float32), ('a2', np.int32), ('str', '|S2')]
dtype2 = [('a2', np.int32), ('a1', np.float32), ('str', '|S2')]
arr1 = arr.astype(dtype1)
arr2 = arr.astype(dtype2)
print(arr1)
print(arr2)
print(arr)
print(sys.version)
print(np.__version__)
print(pd.__version__)
我在不同的 python 版本上进行了测试,并给出了不同的结果.较新的版本给了我意想不到的结果:
I have test it on different python version, and gives me the different result. The newer version gives me the unexpected result:
[(0.1, 2, b'a')]
[(0, 2., b'a')]
[(0.1, 2, 'a')]
3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
1.15.0
0.23.4
虽然旧版本给出了正确的结果:
While the older version give the correct result:
[(0.10000000149011612, 2, 'a') (0.10000000149011612, 2, 'b')]
[(2, 0.10000000149011612, 'a') (2, 0.10000000149011612, 'b')]
[(0.1, 2L, 'a') (0.1, 2L, 'b')]
2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)]
1.11.1
0.20.3
谁能告诉我这是怎么回事?
Can someone tell me what is going on?
推荐答案
https://docs.scipy.org/doc/numpy/user/basics.rec.html#assignment-from-other-structured-arrays
表示来自其他结构化数组的分配是按位置,而不是按字段名称.我认为这适用于 astype
.如果是这样,则意味着您无法使用 astype
重新排序字段.
says that assignment from other structured arrays is by position, not by field name. I think that applies to astype
. If so it means you can't reorder fields with an astype
.
一次访问多个字段在最近的版本中发生了变化,并且可能会发生更多变化.部分原因在于此类访问应该是副本还是视图.
Accessing multiple fields at once has changed in recent releases, and may change more. Part of it is whether such access should be a copy or view.
recfunctions
具有用于添加、删除或合并字段的代码.一个常见的策略是使用新的 dtype 创建一个目标数组,并按字段名称将值复制到它.这是迭代的,但由于通常数组的记录比字段多得多,因此时间损失不大,
recfunctions
has code for adding, deleting or merging fields. A common strategy is to create a target array with the new dtype, and copy values to it by field name. This is iterative but since typically an array will have many more records than fields the time penalty isn't big,
在 1.14 版本中,我可以:
In version 1.14, I can do:
In [152]: dt1 = np.dtype([('a',float),('b',int), ('c','U3')])
In [153]: dt2 = np.dtype([('b',int),('a',float), ('c','S3')])
In [154]: arr1 = np.array([(1,2,'a'),(3,4,'b'),(5,6,'c')], dt1)
In [155]: arr1
Out[155]:
array([(1., 2, 'a'), (3., 4, 'b'), (5., 6, 'c')],
dtype=[('a', '<f8'), ('b', '<i8'), ('c', '<U3')])
仅使用 astype
不会对字段重新排序:
Simply using astype
does not reorder the fields:
In [156]: arr1.astype(dt2)
Out[156]:
array([(1, 2., b'a'), (3, 4., b'b'), (5, 6., b'c')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', 'S3')])
但多字段索引确实:
In [157]: arr1[['b','a','c']]
Out[157]:
array([(2, 1., 'a'), (4, 3., 'b'), (6, 5., 'c')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', '<U3')])
现在 dt2
astype 是正确的:
now the dt2
astype is right:
In [158]: arr2 = arr1[['b','a','c']].astype(dt2)
In [159]: arr2
Out[159]:
array([(2, 1., b'a'), (4, 3., b'b'), (6, 5., b'c')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', 'S3')])
In [160]: arr1['a']
Out[160]: array([1., 3., 5.])
In [161]: arr2['a']
Out[161]: array([1., 3., 5.])
这是 1.14;您使用的是 1.15,并且文档中提到了 1.16 中的差异.所以这是一个移动的目标.
This is 1.14; you are using 1.15, and the docs mention differences in 1.16. So this is a moving target.
astype
的行为与对 'blank' 数组的赋值相同:
The astype
is behaving the same as assignment to 'blank' array:
In [162]: arr2 = np.zeros(arr1.shape, dt2)
In [163]: arr2
Out[163]:
array([(0, 0., b''), (0, 0., b''), (0, 0., b'')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', 'S3')])
In [164]: arr2[:] = arr1
In [165]: arr2
Out[165]:
array([(1, 2., b'a'), (3, 4., b'b'), (5, 6., b'c')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', 'S3')])
In [166]: arr2[:] = arr1[['b','a','c']]
In [167]: arr2
Out[167]:
array([(2, 1., b'a'), (4, 3., b'b'), (6, 5., b'c')],
dtype=[('b', '<i8'), ('a', '<f8'), ('c', 'S3')])
这篇关于记录 numpy astype 的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!