问题描述
我在用Python读取hdf5 matlab 7.3文件时遇到了麻烦.我正在使用h5py 2.0.1.
I'm running into trouble reading a hdf5 matlab 7.3 file with Python.I'm using h5py 2.0.1.
我可以读取文件中存储的所有矩阵,但是无法读取字符串列表.h5py将字符串显示为形状为| 04的形状(1,894)的数据集.该数据集包含对象引用,我尝试使用h5file[obj_ref]
语法取消引用.
I can read all the matrices that are stored in the file, but I can not read a list of strings.h5py shows the strings as a dataset of shape (1, 894) with type |04.This data set contains object references, which I tried to dereference using the h5file[obj_ref]
syntax.
这将产生类似于dataset "FFb": shape (4, 1) type "<u2"
的内容.我将其解释为长度为4的字符数组.似乎是字符串的ASCII表示形式.
This yields something like dataset "FFb": shape (4, 1) type "<u2"
.I interpreted that as a array of chars of length four. Which seems to be the ASCII representation of the string.
有一种简单的方法可以取出琴弦吗?
Is there an easy way to get the strings out?
是否有任何提供matlab的软件包来支持python hdf5?
Is there any package providing matlab to python hdf5 support?
推荐答案
我假设您的意思是它是MATLAB中字符串的单元格数组?此输出看起来很正常:数据集是对象数组(|O4
是NumPy对象数据类型).每个对象都是2字节整数的数组(<u2
是NumPy little-endian无符号2字节整数数据类型). h5py无法知道数据集是字符串的单元格数组;也可能是任意16位整数的单元格数组.
I assume you mean it is a cell array of strings in MATLAB? This output looks normal: the dataset is an array of objects (|O4
is the NumPy object datatype). Each object is an array of 2-byte integers (<u2
is the NumPy little-endian unsigned 2-byte integer datatype). h5py has no way of knowing that the dataset is a cell array of strings; it could just as well be a cell array of arbitrary 16-bit integers.
获取字符串的最简单方法是使用使用unichr的迭代器转换字符,如下所示:
The easiest way to get the strings out would be to use an iterator using unichr to convert the characters, like this:
strlist = [u''.join(unichr(c) for c in h5file[obj_ref]) for obj_ref in dataset])
这是对数据集(for obj_ref in dataset
)进行迭代以创建新列表.对于每个对象引用,它都会取消引用对象(h5file[obj_ref]
)以获取整数数组.它将每个整数转换为一个字符(unichr(c)
),并将所有这些字符连接在一起成为一个Unicode字符串(u''.join()
).
What this does is iterate over the dataset (for obj_ref in dataset
) to create a new list. For each object reference, it dereferences the object (h5file[obj_ref]
) to get an array of integers. It converts each integer into a character (unichr(c)
) and joins those characters all together into a Unicode string (u''.join()
).
请注意,这会产生一个unicode字符串列表.如果您完全确定每个字符串仅包含ASCII字符,则可以将u''
替换为''
,将unichr
替换为chr
.
Note that this produces a list of unicode strings. If you are absolutely sure that every string contains only ASCII characters, you can replace u''
by ''
and unichr
by chr
.
注意事项:我没有h5py;这篇文章是基于我在MATLAB和NumPy上的经验.您可能需要调整语法或迭代顺序以适合您的数据集.
Caveat: I don't have h5py; this post is based on my experiences with MATLAB and NumPy. You may need to adjust the syntax or iteration order to suite your dataset.
这篇关于将HDF5 Matlab字符串加载到Python中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!