有没有一种方法可以强制熊猫将一个空的DataFrame写入HDF文件?

import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx')
df2 = pd.read_hdf('temp.h5', 'xxx')


输出:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
    return store.select(key, auto_close=auto_close, **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2986, in read
    idx=i), start=_start, stop=_stop)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2575, in read_index
    _, index = self.read_index_node(getattr(self.group, key), **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2676, in read_index_node
    data = node[start:stop]
  File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 675, in __getitem__
    return self.read(start, stop, step)
  File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 811, in read
    listarr = self._read_array(start, stop, step)
  File "tables/hdf5extension.pyx", line 2106, in tables.hdf5extension.VLArray._read_array (tables/hdf5extension.c:24649)
ValueError: cannot set WRITEABLE flag to True of this array


format='table'编写:

import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx', format='table')
df2 = pd.read_hdf('temp.h5', 'xxx')


输出:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
    return store.select(key, auto_close=auto_close, **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 722, in select
    raise KeyError('No object named {key} in the file'.format(key=key))
KeyError: 'No object named xxx in the file'


熊猫版:0.24.2

谢谢您的帮助!

最佳答案

将空的DataFrame以HDFStore格式放入fixed应该可行(也许您需要检查其他软件包的版本,例如tables):

# Versions
pd.__version__
tables.__version__

# DF
df = pd.DataFrame(columns=['x','y'])
df

# Dump in fixed format
with pd.HDFStore('temp.h5') as store:
    store.put('df', df, format='f')
    print('Read:')
    store.select('df')

>>> '0.24.2'
>>> '3.5.1'
>>>   x     y
>>>
>>> Read:
>>>   x     y


Pytable确实禁止这样做(至少是这样),但是对于fixed熊猫来说,它的workaround就是这样。

但是,正如在同一个github问题中所讨论的那样,还做了一些努力来修复table的这种行为。但是看起来解决方案仍然“悬而未决”,因为在march的结尾处是如此。

08-24 23:47