问题描述
该问题似乎源于当我尝试在nparray上执行操作时,我在读取带有csv的read_csv时遇到类型问题.以下是一个最小的工作示例.
The problem seems to stem from when I read in the csv with read_csv having a type issue when I try to perform operations on the nparray. The following is a minimum working example.
x = np.array([0.83151197,0.00444986])
df = pd.DataFrame({'numpy': [x]})
np.array(df['numpy']).mean()
Out[151]: array([ 0.83151197, 0.00444986])
这是我所期望的.但是,如果我将结果写入文件,然后将数据读回pandas DataFrame,则类型会损坏.
Which is what I would expect. However, if I write the result to a file and then read the data back into a pandas DataFrame the types are broken.
x = np.array([0.83151197,0.00444986])
df = pd.DataFrame({'numpy': [x]})
df.to_csv('C:/temp/test5.csv')
df5 = pd.read_csv('C:/temp/test5.csv', dtype={'numpy': object})
np.array(df5['numpy']).mean()
以下是"df5"对象的输出
The following is the output of "df5" object
df5
Out[186]:
Unnamed: 0 numpy
0 0 [0.83151197 0.00444986]
以下是文件内容:
,numpy
0,[ 0.83151197 0.00444986]
我弄清楚如何使它起作用的唯一方法是读取数据并手动转换类型,这似乎很愚蠢且缓慢.
The only way I have figured out how to get this to work is to read the data and manually convert the type, which seems silly and slow.
[float(num) for num in df5['numpy'][0][1:-1].split()]
总有办法避免上述情况吗?
Is there anyway to avoid the above?
推荐答案
pd.DataFrame({'col_name': data})
希望一维数组与对象data
类似:
pd.DataFrame({'col_name': data})
expects a 1D array alike objects as data
:
In [63]: pd.DataFrame({'numpy': [0.83151197,0.00444986]})
Out[63]:
numpy
0 0.831512
1 0.004450
In [64]: pd.DataFrame({'numpy': np.array([0.83151197,0.00444986])})
Out[64]:
numpy
0 0.831512
1 0.004450
您已经用[]
包裹了numpy数组,因此您传递了一个numpy数组列表:
you've wrapped numpy array with []
so you passed a list of numpy arrays:
In [65]: pd.DataFrame({'numpy': [np.array([0.83151197,0.00444986])]})
Out[65]:
numpy
0 [0.83151197, 0.00444986]
将df = pd.DataFrame({'numpy': [x]})
替换为df = pd.DataFrame({'numpy': x})
演示:
In [56]: x = np.array([0.83151197,0.00444986])
...: df = pd.DataFrame({'numpy': x})
# ^ ^
...: df.to_csv('d:/temp/test5.csv', index=False)
...:
In [57]: df5 = pd.read_csv('d:/temp/test5.csv')
In [58]: df5
Out[58]:
numpy
0 0.831512
1 0.004450
In [59]: df5.dtypes
Out[59]:
numpy float64
dtype: object
这篇关于似乎无法使用使用 pandas to_csv和read_csv来正确读取numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!