(在 pandas 中)当以表格形式存储在HDF5中时，为什么频率信息会丢失?

本文介绍了(在 pandas 中)当以表格形式存储在HDF5中时，为什么频率信息会丢失?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将HDF5格式的时间序列数据存储在大熊猫中，因为我希望能够直接在磁盘上访问数据，因此我在写时将PyTable格式与table=True一起使用.

I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True when writing.

在将TimeSeries对象写入HDF5之后，我似乎失去了频率信息.

It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.

这可以通过在以下脚本中切换is_table值来看到:

This can be seen by toggling is_table value in script below:

import pandas as pd

is_table = False

times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)

print 'frequency before =', series.index.freq

frame = pd.DataFrame(series)

with pd.get_store('data/simple.h5') as store:
    store.put('data', frame, table=is_table)

with pd.get_store('data/simple.h5') as store:
    x = store['data']

print 'frequency after =', x[0].index.freq

与is_table = False:

frequency before = <1 Hour>
frequency after = <1 Hour>

与is_table = True:

frequency before = <1 Hour>
frequency after = None

在我看来PyTables提供了更丰富的存储机制，事实并非如此.

It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.

PyTables无法存储或复制此信息是根本原因吗?还是这可能是一只大熊猫?

Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?

TimeSeries

(在 pandas 中)当以表格形式存储在HDF5中时，为什么频率信息会丢失?

问题描述

推荐答案