使用odo将巨大的h5文件与多个数据集合并为一个

本文介绍了使用odo将巨大的h5文件与多个数据集合并为一个的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有许多大型(大于13GB)的h5文件，每个h5文件都有两个使用熊猫创建的数据集:

I have a a number of large (13GB+ in size) h5 files, each h5 file has two datasets that were created with pandas:

df.to_hdf('name_of_file_to_save', 'key_1',table=True)
df.to_hdf('name_of_file_to_save', 'key_2', table=True) # saved to the same h5 file as above

我在这里看到过帖子:

连接两个大熊猫.HDFStoreHDF5文件

使用odo连接h5文件.我要为创建的每个h5文件做一个工作，每个文件都有key_1和key_2，将它们组合在一起，以便所有key_1数据都在新的h5文件的一个数据集中，而所有位于同一新h5文件中的另一个数据集中.所有key_1的列数均相同，key_2

on using odo to concatenate h5 files. What I want to do is for each h5 file that was created, each having key_1 and key_2, combine them so that all of the key_1 data are in one dataset in the new h5 file and all of the key_2 are in another dataset in the same new h5 file. All of key_1 have the same number of columns, the same applies to key_2

我已经尝试过了:

from odo import odo
files = ['file1.h5','file2.h5','file3.h5','file4.h5']
for i in files:
    odo('hdfstore://path_to_here_h5_files_live/%s::key_1' % i,
        'hdfstore://path_store_new_large_h5::key_1')

但是我得到一个错误:

(tables/hdf5extension.c:7824)
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "H5A.c", line 259, in H5Acreate2
  unable to create attribute
File "H5Aint.c", line 275, in H5A_create
  unable to create attribute in object header
File "H5Oattribute.c", line 347, in H5O_attr_create
  unable to create new attribute in header
File "H5Omessage.c", line 224, in H5O_msg_append_real
  unable to create new message
File "H5Omessage.c", line 1945, in H5O_msg_alloc
  unable to allocate space for message
File "H5Oalloc.c", line 1142, in H5O_alloc
  object header message is too large

End of HDF5 error back trace

Can't set attribute 'non_index_axes' in node:
/key_1 (Group) ''.
Closing remaining open

使用odo将巨大的h5文件与多个数据集合并为一个

问题描述

推荐答案