我正在尝试使用pandas内置函数to_hdf将DataFrame保存到HDF5存储中
但这会引发以下异常:


  >> create_axes中的文件“ C:\ python \ lib \ site-packages \ pandas \ io \ pytables.py”,行3433
     提高e
  TypeError:无法序列化列[date],因为
  它的数据内容是[datetime]对象dtype


数据帧是由一个numpy数组构建的,每列的类型正确

我在其他框架中阅读时尝试了convert_object(),但仍然失败

这是我的测试代码,我显然在数据转换中缺少了一些东西,但无法弄清楚是什么

import numpy as np
import pandas as pd
from datetime import datetime, timedelta

columns = ['date', 'c1', 'c2']

# building a sample test numpy array with datetime, float and integer
dtype = np.dtype("datetime64, f8, i2")
np_data = np.empty((0, len(columns)), dtype=dtype)
for i in range(1, 3):
    line = [datetime(2015, 1, 1, 12, i), i/2, i*1000]
    np_data = np.append(np_data, np.array([line]), axis=0)
print('##### the numpy array')
print(np_data)

# creating DataFrame from numpy array
df = pd.DataFrame(np_data, columns=columns)
# trying to force object conversion
df.convert_objects()
print('##### the DataFrame array')
print(df)

# the following fails!
try:
    df.to_hdf('store.h5', 'data', append=True)
    print('worked')
except Exception, e:
    print('##### the error')
    print(e)


上面的代码产生以下输出

##### the numpy array
[[datetime.datetime(2015, 1, 1, 12, 1) 0 1000]
 [datetime.datetime(2015, 1, 1, 12, 2) 1 2000]]
##### the DataFrame array
                  date c1    c2
0  2015-01-01 12:01:00  0  1000
1  2015-01-01 12:02:00  1  2000
##### the error
Cannot serialize the column [date] because
its data contents are [datetime] object dtype

最佳答案

几乎所有的熊猫操作都会返回新对象。您的.convert_objects()操作放弃了输出。

In [20]: df2 = df.convert_objects()

In [21]: df.dtypes
Out[21]:
date    object
c1      object
c2      object
dtype: object

In [22]: df2.dtypes
Out[22]:
date    datetime64[ns]
c1               int64
c2               int64
dtype: object


保存/还原

In [23]: df2.to_hdf('store.h5', 'data', append=True)

In [25]: pd.read_hdf('store.h5','data')
Out[25]:
                 date  c1    c2
0 2015-01-01 12:01:00   0  1000
1 2015-01-01 12:02:00   1  2000

In [26]: pd.read_hdf('store.h5','data').dtypes
Out[26]:
date    datetime64[ns]
c1               int64
c2               int64
dtype: object


最后,直接构造数据框更为习惯。根据构造推断类型。

In [32]: DataFrame({'data' : pd.date_range('20150101',periods=2,freq='s'),'c1' : [0,1], 'c2' : [1000,2000]},columns=['data','c1','c2']).dtypes
Out[32]:
data    datetime64[ns]
c1               int64
c2               int64
dtype: object

10-06 06:42