问题描述
我要解决的问题如下-我运行了一个长时间运行的Python(可能需要花费数小时才能完成)的进程,该进程最多可以生成80000个HDF5文件.由于瓶颈之一是不断打开和关闭这些文件,因此我决定编写一个概念验证代码,该代码使用单个HDF5文件作为包含许多表的输出.当然可以,但是我想知道是否存在将指定表(如果可能的话重命名)导出到单独文件中的快速方法?
Problem that I am trying to solve is the following - I have a long running Python (can take many hours to finish) process that produces up to 80000 HDF5 files. As one of the bottlenecks is constant opening and closing of these files I decided to write a proof-of-concept code that uses a single HDF5 file as output that contains many tables. It certainly helps but I wonder if there is a quick(er) way to export specified tables (with renaming if possible) into a separate file?
推荐答案
是的,至少有3种方法可以将数据集的内容从一个HDF5文件复制到另一个.其中包括:
Yes, there are at least 3 ways to copy the contents of a dataset from one HDF5 file to another. They include:
- HDF组中的
-
h5copy
命令行实用程序.您可以指定源和目标HDF5文件,以及源和目标对象.可能无需编写大量代码即可完全满足您的要求.
参考: HDF组:H5Copy文档 - h5py 模块具有用于组和/或数据集的
copy()
方法.您输入源对象和目标对象. - pytables 模块(又名表格)具有
copy_node()
方法.节点是一个组和/或一个数据集.您输入源对象和目标对象.
h5copy
command line utility from The HDF Group. You specify source and destination HDF5 files, along with source and destination objects. Likely this does exactly what you want without a lot of coding.
Ref: HDF Group: H5Copy docs- h5py module has a
copy()
method for groups and/or datasets. You input source and destination objects. - pytables module (aka tables) has a
copy_node()
method. A node is a group and/or a dataset. You input source and destination objects.
如果您选择使用 h5py
,则SO上有几个相关的帖子:
If you choose to use h5py
, there are a couple of relevant posts on SO:
这篇关于有没有一种方法可以将指定的表快速提取到另一个HDF5文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!