本文介绍了将vlen与h5py一起使用时出现莫名其妙的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用h5py构建数据集.由于我要存储具有不同#of行维度的数组,因此我使用了h5py special_type vlen.但是,我遇到了无法解释的行为,也许您可​​以帮助我了解正在发生的事情:

I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain, maybe you can me help in understanding what is happening:

>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0.,  1.,  1.,  1.,  0.,  1.,  1.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.], dtype=float32),
        array([ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.], dtype=float32),
        array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.], dtype=float32),
        array([ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.], dtype=float32),
        array([ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.], dtype=float32)], dtype=object)

我确实希望train_targets[0]具有这种形状,但是我无法识别数组中的行.他们似乎完全混为一谈,但这是一致的.我的意思是,每次尝试上面的代码时,train_targets[0]看起来都一样.

I do expect the train_targets[0] to be of this shape, however I can't recognize the rows in my array. They seem to be totally jumbled about, however it is consistent. By which I mean that every time I try the above code, train_targets[0] looks the same.

为了澄清:我的train_targets中的第一个元素(在本例中为test)具有形状(5,11),但是第二个元素可能具有形状(5,38),这就是为什么我使用vlen的原因.

To clarify: the first element in my train_targets, in this case test, has shape (5,11), however the second element might be of shape (5,38) which is why I use vlen.

谢谢您的帮助

垫子

推荐答案

我认为

train_targets[0] = test

已将(11,5)数组作为F有序数组存储在train_targets行中.根据(9549,5)形状,这是5个元素的行.而且由于它是vlen,所以每个元素都是一个长度为11的一维数组.

has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.

这就是您在train_targets[0]中得到的结果-一个由5个数组组成的数组,每个数组的形状为(11,),其值取自test(F阶).

That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).

所以我认为有2个问题-2d形状意味着什么,vlen允许什么.

So I think there are 2 issues - what a 2d shape means, and what vlen allows.

我的h5py版本是v2.3之前的版本,因此我只得到字符串vlen.但是我怀疑您的问题可能是vlen仅适用于1d数组,可以说是字节字符串的扩展.

My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.

shape=(9549, 5,)中的5test.shape中的5有什么关系吗?我不认为这样做,至少没有像numpyh5py那样看到它.

Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.

当我在字符串vlen示例之后创建文件时:

When I make a file following the string vlen example:

>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)

然后执行:

ds[0]='this one string'

并查看ds[0],我得到一个包含100个元素的对象数组,每个元素都是这个字符串.也就是说,我已经设置了整行的ds.

and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.

ds[0,0]='another'

是仅设置一个元素的正确方法.

is the correct way to set just one element.

vlen是可变长度",而不是可变形状".而 https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes .html 文档对此并不十分清楚,我认为您可以将形状为(11,)(38,)的一维数组与vlen一起存储,而不能存储二维数组.

vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.

实际上,train_targets输出是通过以下方式重现的:

Actually, train_targets output is reproduced with:

In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
    test1[i]=test.T.flatten()[i:i+11]

这是从转置(F阶)中获取的11个值,但是每个子数组的值都发生了变化.

It's 11 values taken from the transpose (F order), but shifted for each sub array.

这篇关于将vlen与h5py一起使用时出现莫名其妙的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-31 04:18