本文介绍了(在 pandas 中)当以表格形式存储在HDF5中时,为什么频率信息会丢失?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将HDF5格式的时间序列数据存储在大熊猫中,因为我希望能够直接在磁盘上访问数据,因此我在写时将PyTable格式与table=True一起使用.

I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True when writing.

在将TimeSeries对象写入HDF5之后,我似乎失去了频率信息.

It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.

这可以通过在以下脚本中切换is_table值来看到:

This can be seen by toggling is_table value in script below:

import pandas as pd

is_table = False

times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)

print 'frequency before =', series.index.freq

frame = pd.DataFrame(series)

with pd.get_store('data/simple.h5') as store:
    store.put('data', frame, table=is_table)

with pd.get_store('data/simple.h5') as store:
    x = store['data']

print 'frequency after =', x[0].index.freq

is_table = False:

frequency before = <1 Hour>
frequency after = <1 Hour>

is_table = True:

frequency before = <1 Hour>
frequency after = None

在我看来PyTables提供了更丰富的存储机制,事实并非如此.

It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.

PyTables无法存储或复制此信息是根本原因吗?还是这可能是一只大熊猫?

Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?

推荐答案

刚刚从熊猫确认,当前版本未实现.

Just confirmed from pandas that this is not implemented in the current release.

请参阅: https://github.com/pydata/pandas/issues/3499#issuecomment-17262905 进行解决.

此答案可用时,我将对其进行更新.

I will update this answer when it becomes available.

这篇关于(在 pandas 中)当以表格形式存储在HDF5中时,为什么频率信息会丢失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 09:52