本文介绍了使用h5py沿新轴将数据添加到现有h5py文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些示例代码可以生成3d Numpy数组-然后我将这些数据使用h5文件保存到h5py文件中.然后,如何沿第二维附加"第二个数据集?或者,如何沿现有.h5文件的第4维(或新轴)编写另一个3d数据集?我已经阅读了可以找到的文档,但没有一个示例似乎可以解决这个问题.我的代码如下所示:

I have some sample code that generates a 3d Numpy array -- I am then saving this data into a h5py file using h5 file. How can I then "append" the second dataset along the 4th dimension? Or, how can I write another 3d dataset along the 4th dimension (or new axis) of an existing .h5 file? I have read documentation that I could find, and none of the examples seem to address this. My code is shown below:

import h5py
import numpy as np

dataset1 = np.random.rand(240,240,250);
dataset2 = np.random.rand(240,240,250);

with h5py.File('data.h5', 'w') as hf:
    dset = hf.create_dataset('dataset_1', data=dataset1)

推荐答案

使用 http ://docs.h5py.org/en/latest/high/dataset.html 我做了一些实验:

In [504]: import h5py
In [505]: f=h5py.File('data.h5','w')
In [506]: data=np.ones((3,5))

设为普通的dataset:

In [509]: dset=f.create_dataset('dset', data=data)
In [510]: dset.shape
Out[510]: (3, 5)
In [511]: dset.maxshape
Out[511]: (3, 5)

resize的帮助:

In [512]: dset.resize?
Signature: dset.resize(size, axis=None)
Docstring:
Resize the dataset, or the specified axis.

The dataset must be stored in chunked format; it can be resized up to
the "maximum shape" (keyword maxshape) specified at creation time.
The rank of the dataset cannot be changed.

由于我未指定maxshape,因此我似乎无法更改或添加到该数据集.

Since I didn't specify maxshape it doesn't look like I can change or add to this dataset.

In [513]: dset1=f.create_dataset('dset1', data=data, maxshape=(2,10,10))
...
ValueError: "maxshape" must have same rank as dataset shape

所以我无法定义3d'空间'并在其中放置2d数组-至少不是这样.

So I can't define a 3d 'space' and put a 2d array in it - at least not this way.

但是我可以为data添加一个尺寸(等级):

But I can add a dimension (rank) to data:

In [514]: dset1=f.create_dataset('dset1', data=data[None,...], maxshape=(2,10,10))
In [515]: dset1
Out[515]: <HDF5 dataset "dset1": shape (1, 3, 5), type "<f8">

现在,我可以调整数据集的大小-在1个或多个维度上,直至定义的最大值.

Now I can resize the dataset - in 1 or more dimensions, up to the defined max.

In [517]: dset1.resize((2,3,10))
In [518]: dset1
Out[518]: <HDF5 dataset "dset1": shape (2, 3, 10), type "<f8">
In [519]: dset1[:]
Out[519]:
array([[[ 1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.]],

       [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]])

原始data占据了扩展数据集的一角

The original data occupies a corner of the expanded dataset

现在填写一些零:

In [521]: dset1[1,:,:]=10
In [523]: dset1[0,:,5:]=2

In [524]: dset1[:]
Out[524]:
array([[[  1.,   1.,   1.,   1.,   1.,   2.,   2.,   2.,   2.,   2.],
        [  1.,   1.,   1.,   1.,   1.,   2.,   2.,   2.,   2.,   2.],
        [  1.,   1.,   1.,   1.,   1.,   2.,   2.,   2.,   2.,   2.]],

       [[ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.],
        [ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.],
        [ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.]]])

所以是的,您可以将两个dataset都放入一个h5数据集中,前提是您指定了足够大的maxshape作为开始. (2,240,240,250)或(240,240,500)或(240,240,250,2)等.

So yes, you can put both of your dataset in one h5 dataset, provided you specified a large enough maxshape to start with, e.g. (2,240,240,250) or (240,240,500) or (240,240,250,2) etc.

或者无限调整maxshape=(None, 240, 240, 250))的大小.

看起来主要的约束条件是创建后无法添加维度.

Looks like the main constraint is you can't added a dimension after creation.

另一种方法是在存储之前将数据连接起来,例如

Another approach is to concatenate the data before storing, e.g.

dataset12 = np.stack((dataset1, dataset2), axis=0)

这篇关于使用h5py沿新轴将数据添加到现有h5py文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-29 14:19