本文介绍了写作和放大器;追加浮动的阵列中使用C HDF5文件中的唯一的数据集++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文件的

我处理数,文件的每个处理将输出几千浮动的阵列和我将所有文件在一个庞大的数据集中的数据存储在一个单一的HDF5 作进一步处理。

问题是目前我感到困惑如何到我的数据添加到HDF5文件。在2环以上(在上面的code注释),你可以看到,我希望每次追加浮动的1维数组到HDF5,而不是整个事情。我的数据以TB为单位的,我们只能将数据追加到文件中。

有几个问题:


  1. 如何追加在这种情况下,数据?我必须用什么样的功能?

  2. 右键现在,我有fdim [0] = 928347543,我在尝试HDF5看跌旗无穷大,但运行时执行抱怨。有没有办法做到这一点?我不想来计算我每次有数据;是有办法只是简单地保持在添加数据,而不关心的fdim?值

或者,这是不可能的?

编辑:

我一直在下面西蒙的建议,目前这里是我更新code:

  hid_t desFi5;
    hid_t FID1;
    hid_t proplist这样;
    hsize_t fdim [2];    desFi5 = H5Fcreate(SAVEFILEPATH,H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT);    fdim [0] = 3;
    fdim [1] = 1; // H5S_UNLIMITED;    FID1 = H5Screate_simple(2,fdim,NULL);    COUT<< ----------------------------------空间做的\\ n;    proplist这样= H5Pcreate(H5P_DATASET_CREATE);    H5Pset_layout(proplist这样,H5D_CHUNKED);    INT为ndims = 2;
    hsize_t chunk_dims [2];
    chunk_dims [0] = 3;
    chunk_dims [1] = 1;    H5Pset_chunk(proplist这样,为ndims,chunk_dims);    COUT<< ----------------------------------物业做的\\ n;    hid_t数据集1 = H5Dcreate(desFi5,德,H5T_NATIVE_FLOAT,FID1,H5P_DEFAULT,proplist这样,H5P_DEFAULT);    COUT<< ----------------------------------数据集做的\\ n;    bufi =新的浮动* [1];
    bufi [0] =新的浮动[3];
    bufi [0] [0] = 0;
    bufi [0] [1] = 1;
    bufi [0] [2] = 2;    // hyperslab
    hsize_t开始[2] = {0,0};
    hsize_t步幅[2] = {1,1};
    hsize_t算[2] = {1,1};
    hsize_t块[2] = {1,3};    H5Sselect_hyperslab(FID1,H5S_SELECT_OR,启动,步幅,计数,块);
    COUT<< ---------------------------------- hyperslab做的\\ n;    H5Dwrite(数据集1,H5T_NATIVE_FLOAT,H5S_ALL,H5S_ALL,H5P_DEFAULT,* bufi);    fdim [0] = 3;
    fdim [1] = H5S_UNLIMITED; //抱怨这里
    H5Dset_extent(数据集1,fdim);    COUT<< ----------------------------------程度做的\\ n;    // hyperslab2
    hsize_t START2 [2] = {1,0};
    hsize_t stride2 [2] = {1,1};
    hsize_t COUNT2 [2] = {1,1};
    hsize_t块2 [2] = {1,3};    H5Sselect_hyperslab(FID1,H5S_SELECT_OR,START2,stride2,COUNT2,块2);
    COUT<< ---------------------------------- hyperslab2做的\\ n;    H5Dwrite(数据集1,H5T_NATIVE_FLOAT,H5S_ALL,H5S_ALL,H5P_DEFAULT,* bufi);    COUT<< ---------------------------------- H5Dwrite做的\\ n;
    H5Dclose(数据集1);
    COUT<< ----------------------------------数据集关闭\\ n;
    H5Pclose(proplist这样);
    COUT<< ----------------------------------财产清单关闭\\ n;
    H5Sclose(FID1);
    COUT<< ----------------------------------数据空间FID1关闭\\ n;
    H5Fclose(desFi5);
    COUT<< ---------------------------------- desFi5关闭\\ n;

我的电流输出是:

 的bash-3.2 $ ./hdf5AppendTest.out
----------------------------------空间做
----------------------------------属性来完成
----------------------------------数据集做
---------------------------------- hyperslab完成
HDF5-DIAG:在HDF5检测错误(1.8.10)线程0:
  #000:/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5D.c线1103 H5Dset_extent():无法设置扩展数据集
    大:数据集
    未成年人:无法初始化对象
  #001:/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dint.c线2179在H5D__set_extent():无法修改的数据空间大小
    大:数据集
    未成年人:无法初始化对象
  #002:/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5S.c行1874年H5S_set_extent():尺寸不能超过现有的最大尺寸(新:18446744073709551615最大:1)
    大:数据空间
    未成年人:坏值
----------------------------------程度做
---------------------------------- hyperslab2完成
---------------------------------- H5Dwrite完成
----------------------------------数据集关闭
----------------------------------属性列表关闭
----------------------------------数据空间FID1关闭
---------------------------------- desFi5关闭

目前,我看到的设置与无限的东西H5Dset_extent运行期间还是引起了问题。 (有问题的功能上都标有 //抱怨这里在上面的code)。我已经有了一个块数据由西蒙指定的,所以这里有什么问题吗?

在另一方面,没有H5Dset_extent,我可以写一个测试阵列[0,1,2]就好了,但如何才能使code输出上面的测试序列像这样的文件:

  [0,1,2]
[0,1,2]
[0,1,2]
[0,1,2]
...
...

回忆:这只是一个测试阵列,真实的数据是做大了,我无法容纳整个事情的RAM,所以我必须一次性把数据部分由第一部分

编辑2:

我已经按照以上西蒙的建议。这里是关键部分:

  hsize_t N = 3,P = 1;
浮动* bufi_data =新的浮动[N * P]。
浮** bufi =新的浮动* [N];
对于(hsize_t我= 0; I< N ++我){
    bufi [I] =安培; bufi_data [我* N];
}bufi [0] [0] = 0.1;
bufi [0] [1] = 0.2;
bufi [0] [2] = 0.3;// hyperslab
hsize_t开始[2] = {0,0};
hsize_t算[2] = {3,1};H5Sselect_hyperslab(FID1,H5S_SELECT_SET,启动,NULL,计数,NULL);
COUT<< ---------------------------------- hyperslab做的\\ n;H5Dwrite(数据集1,H5T_NATIVE_FLOAT,H5S_ALL,FID1,H5P_DEFAULT,* bufi);bufi [0] [0] = 0.4;
bufi [0] [1] = 0.5;
bufi [0] [2] = 0.6;hsize_t fdimNew [2];
fdimNew [0] = 3;
fdimNew [1] = 2;
H5Dset_extent(数据集1,fdimNew);COUT<< ----------------------------------程度做的\\ n;// hyperslab2
hsize_t START2 [2] = {0,0}; //问题
hsize_t COUNT2 [2] = {3,1};H5Sselect_hyperslab(FID1,H5S_SELECT_SET,START2,NULL,COUNT2,NULL);
COUT<< ---------------------------------- hyperslab2做的\\ n;H5Dwrite(数据集1,H5T_NATIVE_FLOAT,H5S_ALL,FID1,H5P_DEFAULT,* bufi);

从上面的,我得到了以下输出HDF5:

  0.4 0.5 0.6
  0 0 0

在进一步的实验与 START2 COUNT2 ,我看到这些变数不仅影响开始索引和递增指数 bufi 。它不动我的数据集的写作指数的位置都没有。

回忆:最后的结果一定是:

  0.1 0.2 0.3
0.4 0.5 0.6

此外,它必须是 bufi 而不是 * bufi H5Dwrite ,西门,因为 bufi 给我完全的随机数。

更新3:

有关西蒙建议选择部分:

  hsize_t开始[2] = {0,0};
hsize_t计数[2] = {1,3};hsize_t开始[2] = {1,0};
hsize_t计数[2] = {1,3};

这会发出以下错误:

  HDF5-DIAG:在HDF5检测错误(1.8.10)线程0:
  #000:在H5Dwrite()/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c行245:选择文件+偏移不是范围内
    大:数据空间
    未成年人:超出范围

算[2] {3,1} ,而不是 { 1,3} ,我想?和开始[2] ,如果我不把它设置为 {0,0} ,它总是会大喊上述错误。

您确定这是正确的?


解决方案

You must use hyperslabs. That's what you need to write only part of a dataset.The function to do that is H5Sselect_hyperslab. Use it on fd1 and use fd1 as your file dataspace in your H5Dwrite call.

You need to create a chunked dataset in order to be able to set its maximum size to infinity. Create a dataset creation property list and use H5Pset_layout to make it chunked. Use H5Pset_chunk to set the chunk size. Then create your dataset using this property list.

You can do two things:

  1. Precompute the final size so you can create a dataset big enough. It looks like that's what you are doing.

  2. Extend your dataset as you go using H5Dset_extent. For this you need to set the maximum dimensions to infinity so you need a chunked dataset (see above).

In both case, you need to select an hyperslab on the file dataspace in your H5Dwrite call (see above).

Walkthrough working code

#include <iostream>
#include <hdf5.h>

// Constants
const char saveFilePath[] = "test.h5";
const hsize_t ndims = 2;
const hsize_t ncols = 3;

int main()
{

First, create a hdf5 file.

    hid_t file = H5Fcreate(saveFilePath, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
    std::cout << "- File created" << std::endl;

Then create a 2D dataspace.The size of the first dimension is unlimited. We set it initially to 0 to show how you can extend the dataset at each step. You could also set it to the size of the first buffer you are going to write for instance.The size of the second dimension is fixed.

    hsize_t dims[ndims] = {0, ncols};
    hsize_t max_dims[ndims] = {H5S_UNLIMITED, ncols};
    hid_t file_space = H5Screate_simple(ndims, dims, max_dims);
    std::cout << "- Dataspace created" << std::endl;

Then create a dataset creation property list.The layout of the dataset have to be chunked when using unlimited dimensions.The choice of the chunk size affects performances, both in time and disk space. If the chunks are very small, you will have a lot of overhead. If they are too large, you might allocate space that you don't need and your files might end up being too large.This is a toy example so we will choose chunks of one line.

    hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
    H5Pset_layout(plist, H5D_CHUNKED);
    hsize_t chunk_dims[ndims] = {2, ncols};
    H5Pset_chunk(plist, ndims, chunk_dims);
    std::cout << "- Property list created" << std::endl;

Create the dataset.

    hid_t dset = H5Dcreate(file, "dset1", H5T_NATIVE_FLOAT, file_space, H5P_DEFAULT, plist, H5P_DEFAULT);
    std::cout << "- Dataset 'dset1' created" << std::endl;

Close resources. The dataset is now created so we don't need the property list anymore.We don't need the file dataspace anymore because when the dataset will be extended, it will become invalid as it will still hold the previous extent.So we will have to grab the updated file dataspace anyway.

    H5Pclose(plist);
    H5Sclose(file_space);

We will now append two buffers to the end of the dataset.The first one will be two lines long.The second one will be three lines long.

First buffer

We create a 2D buffer (contigous in memory, row major order).We will allocate enough memory to store 3 lines, so we can reuse the buffer.Let us create an array of pointers so we can use the b[i][j] notationinstead of buffer[i * ncols + j]. This is purely esthetic.

    hsize_t nlines = 3;
    float *buffer = new float[nlines * ncols];
    float **b = new float*[nlines];
    for (hsize_t i = 0; i < nlines; ++i){
        b[i] = &buffer[i * ncols];
    }

Initial values in buffer to be written in the dataset:

    b[0][0] = 0.1;
    b[0][1] = 0.2;
    b[0][2] = 0.3;
    b[1][0] = 0.4;
    b[1][1] = 0.5;
    b[1][2] = 0.6;

We create a memory dataspace to indicate the size of our buffer in memory.Remember the first buffer is only two lines long.

    dims[0] = 2;
    dims[1] = ncols;
    hid_t mem_space = H5Screate_simple(ndims, dims, NULL);
    std::cout << "- Memory dataspace created" << std::endl;

We now need to extend the dataset.We set the initial size of the dataset to 0x3, we thus need to extend it first.Note that we extend the dataset itself, not its dataspace.Remember the first buffer is only two lines long.

    dims[0] = 2;
    dims[1] = ncols;
    H5Dset_extent(dset, dims);
    std::cout << "- Dataset extended" << std::endl;

Select hyperslab on file dataset.

    file_space = H5Dget_space(dset);
    hsize_t start[2] = {0, 0};
    hsize_t count[2] = {2, ncols};
    H5Sselect_hyperslab(file_space, H5S_SELECT_SET, start, NULL, count, NULL);
    std::cout << "- First hyperslab selected" << std::endl;

Write buffer to dataset.mem_space and file_space should now have the same number of elements selected.Note that buffer and &b[0][0] are equivalent.

    H5Dwrite(dset, H5T_NATIVE_FLOAT, mem_space, file_space, H5P_DEFAULT, buffer);
    std::cout << "- First buffer written" << std::endl;

We can now close the file dataspace.We could close the memory dataspace now and create a new one for the second buffer,but we will simply update its size.

    H5Sclose(file_space);

Second buffer

New values in buffer to be appended to the dataset:

    b[0][0] = 1.1;
    b[0][1] = 1.2;
    b[0][2] = 1.3;
    b[1][0] = 1.4;
    b[1][1] = 1.5;
    b[1][2] = 1.6;
    b[2][0] = 1.7;
    b[2][1] = 1.8;
    b[2][2] = 1.9;

Resize the memory dataspace to indicate the new size of our buffer.The second buffer is three lines long.

    dims[0] = 3;
    dims[1] = ncols;
    H5Sset_extent_simple(mem_space, ndims, dims, NULL);
    std::cout << "- Memory dataspace resized" << std::endl;

Extend dataset.Note that in this simple example, we know that 2 + 3 = 5.In general, you could read the current extent from the file dataspaceand add the desired number of lines to it.

    dims[0] = 5;
    dims[1] = ncols;
    H5Dset_extent(dset, dims);
    std::cout << "- Dataset extended" << std::endl;

Select hyperslab on file dataset.Again in this simple example, we know that 0 + 2 = 2.In general, you could read the current extent from the file dataspace.The second buffer is three lines long.

    file_space = H5Dget_space(dset);
    start[0] = 2;
    start[1] = 0;
    count[0] = 3;
    count[1] = ncols;
    H5Sselect_hyperslab(file_space, H5S_SELECT_SET, start, NULL, count, NULL);
    std::cout << "- Second hyperslab selected" << std::endl;

Append buffer to dataset

    H5Dwrite(dset, H5T_NATIVE_FLOAT, mem_space, file_space, H5P_DEFAULT, buffer);
    std::cout << "- Second buffer written" << std::endl;

The end: let's close all the resources:

    delete[] b;
    delete[] buffer;
    H5Sclose(file_space);
    H5Sclose(mem_space);
    H5Dclose(dset);
    H5Fclose(file);
    std::cout << "- Resources released" << std::endl;
}


NB: I removed the previous updates because the answer was too long. If you are interested, browse the history.

这篇关于写作和放大器;追加浮动的阵列中使用C HDF5文件中的唯一的数据集++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 10:03