问题描述
我将HDF5格式的时间序列数据存储在大熊猫中,因为我希望能够直接在磁盘上访问数据,因此我在写时将PyTable格式与table=True
一起使用.
I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True
when writing.
在将TimeSeries对象写入HDF5之后,我似乎失去了频率信息.
It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.
这可以通过在以下脚本中切换is_table
值来看到:
This can be seen by toggling is_table
value in script below:
import pandas as pd
is_table = False
times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)
print 'frequency before =', series.index.freq
frame = pd.DataFrame(series)
with pd.get_store('data/simple.h5') as store:
store.put('data', frame, table=is_table)
with pd.get_store('data/simple.h5') as store:
x = store['data']
print 'frequency after =', x[0].index.freq
与is_table = False
:
frequency before = <1 Hour>
frequency after = <1 Hour>
与is_table = True
:
frequency before = <1 Hour>
frequency after = None
在我看来PyTables提供了更丰富的存储机制,事实并非如此.
It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.
PyTables无法存储或复制此信息是根本原因吗?还是这可能是一只大熊猫?
Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?
推荐答案
刚刚从熊猫确认,当前版本未实现.
Just confirmed from pandas that this is not implemented in the current release.
请参阅: https://github.com/pydata/pandas/issues/3499#issuecomment-17262905 进行解决.
此答案可用时,我将对其进行更新.
I will update this answer when it becomes available.
这篇关于(在 pandas 中)当以表格形式存储在HDF5中时,为什么频率信息会丢失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!