问题描述
让我们说我有一个代表某些数据的字节对象,我想通过np.genfromtxt
将其转换为numpy
数组.我在理解这种情况下如何处理字符串时遇到了麻烦.让我们从以下内容开始:
Lets say I have a bytes object that represents some data, and I want to convert it to a numpy
array via np.genfromtxt
. I am having trouble understanding how I should handle strings in this case. Let's start with the following:
from io import BytesIO
import numpy as np
text = b'test, 5, 1.2'
types = ['str', 'i4', 'f4']
np.genfromtxt(BytesIO(text), delimiter = ',', dtype = types)
这不起作用.它引发
TypeError: data type not understood
如果我更改types
以便types = ['c', 'i4', 'f4']
然后numpy
调用返回
array((b't', 5, 1.2000000476837158),
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<f4')])
这样就可以了,但是很明显,我只得到了字符串的第一个字母.
So it works, but I am only getting the first letter of the string, obviously.
如果我将c8
或c16
用作test
的dtype,那么我得到
If I use c8
or c16
for the dtype of test
, then I get
array(((nan+0j), 5, 1.2000000476837158),
dtype=[('f0', '<c8'), ('f1', '<i4'), ('f2', '<f4')])
这是垃圾.我也尝试使用a
和U
,但没有成功. genfromtxt
如何将元素识别并保存为字符串?
which is garbage. I've also tried using a
, and U
, no success. How in the world do I get genfromtxt
to recognize and save elements as a string?
我假设部分内容是这是一个bytes
对象.但是,如果我改为使用普通字符串作为text
,并使用StringIO
而不是BytesIO
,则genfromtxt
会引发错误:
I assume part of the ssue is that this is a bytes
object. However, if I instead use a normal string as text
, and use StringIO
rather than BytesIO
, then genfromtxt
raises an error:
TypeError: Can't convert
字节object to str implicitly
推荐答案
在我的Python3会话中:
In my Python3 session:
In [568]: text = b'test, 5, 1.2'
# I don't need BytesIO since genfromtxt works with a list of
# byte strings, as from text.splitlines()
In [570]: np.genfromtxt([text], delimiter=',', dtype=None)
Out[570]:
array((b'test', 5, 1.2),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
如果留给自己的设备使用,则genfromtxt
推导第一个字段应为S4
-4个字节字符串字符.
If left to its own devices genfromtxt
deduces that the 1st field should be S4
- 4 bytestring characters.
我也可以使用以下类型明确显示
I could also be explicit with the types:
In [571]: types=['S4', 'i4', 'f4']
In [572]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[572]:
array((b'test', 5, 1.2000000476837158),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f4')])
In [573]: types=['S10', 'i', 'f']
In [574]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[574]:
array((b'test', 5, 1.2000000476837158),
dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<f4')])
In [575]: types=['U10', 'int', 'float']
In [576]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[576]:
array(('test', 5, 1.2),
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f8')])
我可以指定S
或U
(unicode),但是我还必须指定长度.我认为genfromtxt
没有办法让它推断出长度-除了None
类型.我必须深入研究代码,看看它如何推导字符串长度.
I can specify either S
or U
(unicode), but I also have to specify the length. I don't think there's a way with genfromtxt
to let it deduce the length - except for the None
type. I'd have to dig into the code to see how it deduces the string length.
我还可以使用np.array
创建此数组(通过将其变为子字符串元组,并提供正确的dtype:
I could also create this array with np.array
(by making it a tuple of substrings, and giving a correct dtype:
In [599]: np.array(tuple(text.split(b',')), dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
Out[599]:
array((b'test', 5, 1.2),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
这篇关于了解NumPy对字符串数据类型的解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!