问题描述
我得到了一个包含我所有数据的文本文件
I got a text file containing all my data
data = 'B:/tempfiles/bla.dat'
从文本文件中,我列出了列标题及其类型
from the text file I'm listing the column header and their types with
col_headers = [('VW_3_Avg','<f8'),('Lvl_Max(1)','<f8')]
然后创建一个包含选项的字典变量:
Then creating a dictionary variable holding the options:
kwargs = dict(delimiter=',',\
deletechars=' ',\
dtype=col_headers,\
skip_header=4,\
skip_footer=0,\
filling_values='NaN',\
missing_values={'\"NAN\"'}\
)
现在将数据导入到变量数据文件
Now importing the data to the variable datafile
datafile = scipy.genfromtxt(datafile, **kwargs)
然后我用
VW1 = datafile['VW_3_Avg']
Lv1 = datafile['Lvl_Max(1)']
它与第一个(包含下划线)完美配合,而不是第二个(括号).我得到一个错误,不仅是这个条目,还有所有包含括号的:
It works perfectly with the first one (containing underlines), not with the second (parentheses). I get an Error, not only with this entry, but with all that contain parentheses:
ValueError: field named Lvl_Max(1) not found
当我将文本文件中的括号更改为下划线时,效果很好.但我不能说为什么它不允许我使用括号——而且我无法更改文本文件格式,因为这是在外部生成的.当然,我可以用脚本将括号更改为下划线,但我认为正确处理应该不是什么大问题.在这种情况下,我在哪里以及为什么缺少正确的格式优先级?
When I change those parentheses in the text file to underlines, it works perfectly. But I can't say why it won't let me use parentheses - and I can't change the text file formatting as this is produced externally. Of course I could change the parentheses to underlines with a script, but I think it shouldn't be a big issue to get this right. Where and why am I missing the correct formatting precedence in this case?
推荐答案
行为被记录在案,NameValidator 类在 lib/_iotools.py
中解析传入genfromtxt
的名称:
The behaviour is documented, the NameValidator class in lib/_iotools.py
which parses the names passed in to genfromtxt
:
class NameValidator(object):
"""
Object to validate a list of strings to use as field names.
The strings are stripped of any non alphanumeric character, and spaces
are replaced by '_'. During instantiation, the user can define a list
of names to exclude, as well as a list of invalid characters. Names in
the exclusion list are appended a '_' character.
Once an instance has been created, it can be called with a list of
names, and a list of valid names will be created. The `__call__`
method accepts an optional keyword "default" that sets the default name
in case of ambiguity. By default this is 'f', so that names will
default to `f0`, `f1`, etc.
您的案例中的相关行是字符串被去除了任何非字母数字字符
您可以通过在名称中包含其他非字母数字字符的列表上调用 NameValidator.validate
来查看行为:
You can see the behaviour by calling the NameValidator.validate
on a list with other non alphanumeric characters in the names:
In [17]: from numpy.lib._iotools import NameValidator
In [18]: l = ["foo(1)","bar!!!","foo bar??"]
In [19]: NameValidator().validate(l)
Out[19]: ('foo1', 'bar', 'foo_bar')
同样使用 genfromtxt:
And the same using genfromtxt:
In [24]: datafile = np.genfromtxt("foo.txt", dtype=[('foo!! bar??', '<f8'), ('foo bar bar$', '<f8')], delimiter=",",defaultfmt="%")
In [25]: datafile.dtype
Out[25]: dtype([('foo_bar', '<f8'), ('foo_bar_bar', '<f8')])
这篇关于字符串格式问题(括号与下划线)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!