问题描述
我正在使用h5py构建数据集.由于我要存储具有不同#of行维度的数组,因此我使用了h5py special_type vlen.但是,我遇到了无法解释的行为,也许您可以帮助我了解正在发生的事情:
I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain, maybe you can me help in understanding what is happening:
>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.], dtype=float32),
array([ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.], dtype=float32),
array([ 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.], dtype=float32),
array([ 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32),
array([ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)], dtype=object)
我确实希望train_targets[0]
具有这种形状,但是我无法识别数组中的行.他们似乎完全混为一谈,但这是一致的.我的意思是,每次尝试上面的代码时,train_targets[0]
看起来都一样.
I do expect the train_targets[0]
to be of this shape, however I can't recognize the rows in my array. They seem to be totally jumbled about, however it is consistent. By which I mean that every time I try the above code, train_targets[0]
looks the same.
为了澄清:我的train_targets
中的第一个元素(在本例中为test
)具有形状(5,11)
,但是第二个元素可能具有形状(5,38)
,这就是为什么我使用vlen的原因.
To clarify: the first element in my train_targets
, in this case test
, has shape (5,11)
, however the second element might be of shape (5,38)
which is why I use vlen.
谢谢您的帮助
垫子
推荐答案
我认为
train_targets[0] = test
已将(11,5)
数组作为F
有序数组存储在train_targets
行中.根据(9549,5)
形状,这是5个元素的行.而且由于它是vlen
,所以每个元素都是一个长度为11的一维数组.
has stored your (11,5)
array as an F
ordered array in a row of train_targets
. According to the (9549,5)
shape, that's a row of 5 elements. And since it is vlen
, each element is a 1d array of length 11.
这就是您在train_targets[0]
中得到的结果-一个由5个数组组成的数组,每个数组的形状为(11,)
,其值取自test
(F阶).
That's what you get back in train_targets[0]
- an array of 5 arrays, each shape (11,)
, with values taken from test
(order F).
所以我认为有2个问题-2d形状意味着什么,vlen允许什么.
So I think there are 2 issues - what a 2d shape means, and what vlen allows.
我的h5py
版本是v2.3之前的版本,因此我只得到字符串vlen.但是我怀疑您的问题可能是vlen
仅适用于1d数组,可以说是字节字符串的扩展.
My version of h5py
is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen
only works with 1d arrays, an extension, so to speak, of byte strings.
shape=(9549, 5,)
中的5
与test.shape
中的5
有什么关系吗?我不认为这样做,至少没有像numpy
和h5py
那样看到它.
Does the 5
in shape=(9549, 5,)
have anything to do with 5
in the test.shape
? I don't think it does, at least not as numpy
and h5py
see it.
当我在字符串vlen示例之后创建文件时:
When I make a file following the string vlen example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
然后执行:
ds[0]='this one string'
并查看ds[0]
,我得到一个包含100个元素的对象数组,每个元素都是这个字符串.也就是说,我已经设置了整行的ds
.
and look at ds[0]
, I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds
.
ds[0,0]='another'
是仅设置一个元素的正确方法.
is the correct way to set just one element.
vlen
是可变长度",而不是可变形状".而 https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes .html 文档对此并不十分清楚,我认为您可以将形状为(11,)
和(38,)
的一维数组与vlen
一起存储,而不能存储二维数组.
vlen
is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,)
and (38,)
with vlen
, but not 2d ones.
实际上,train_targets
输出是通过以下方式重现的:
Actually, train_targets
output is reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
这是从转置(F阶)中获取的11个值,但是每个子数组的值都发生了变化.
It's 11 values taken from the transpose (F order), but shifted for each sub array.
这篇关于将vlen与h5py一起使用时出现莫名其妙的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!