np.fromregex，字符串为dtype

本文介绍了np.fromregex，字符串为dtype的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件，其日期格式为"1:*?year mo da ho mi se.condsdec"(?"是1个字符的通配符)，即:

I have a file with dates formatted as "1:*? year mo da ho mi se.condsdec", (with "?" being a 1 character wildcard) ie:

*A 2014 12 31 23 59 59.123456

我想将其提取为字符串(最终转换为日期时间字符串).

I would like to extract this either as strings (to eventually be converted to datetime strings).

我可以使用正则表达式模式将日期提取为一组int/floats:

I am able to extract the date as a set of int/floats using the regex pattern:

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'

，但不能作为字符串.如何使用字符串使它工作?

but not as a string. How do I get this to work using a string?

我正在将python 3.4.3与numpy 1.9.3一起使用.

I am using python 3.4.3 with numpy 1.9.3.

import numpy as np
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.int16),('month',np.int8),('day',np.int8),\
('hour',np.int8),('min',np.int8),('sec',np.float64)]
out=np.fromregex('filename',time_pattern,t_dtype)
print(out)
#returns [(2013, 11, 26, 0, 0, 10.0) (2013, 11, 26, 0, 0, 20.0)
# (2013, 11, 26, 0, 0, 30.0)]


basic_t=r'$\*.{2}(.{28})'
t_dtype=[('date',str)]
out=np.fromregex('filename',basic_t,t_dtype)
#causes TypeError:
#TypeError: Empty data-type

使用文件filename:

*  2003 11 26 00 00 10.00000000
some text or interesting data
*  2003 11 26 00 00 20.00000000
more text
even more text
*  2003 11 26 00 00 30.00000000
etc.

请注意，模式是简单的

with open(file) as f:
   for line in f:
      m=re.search(basic_t,line)

但是我想将输出作为一个numpy数组，并希望将运行时保持在最低限度.

But I would like to have the output as a numpy array, and would like to keep runtime to a minimum.

编辑将dtype更改为'S'或np.str可以消除错误，但是我仍然得到一个空列表作为输出

EditChanging dtype to 'S' or np.str removes the error, but I still get an empty list as output

推荐答案

您的问题是，当您应将dtype指定为np.str_时，将dtype设置为int或float.您还需要指定字符串的长度，这样

Your problem is you are setting the dtype as int or float when you should be specifying them as np.str_. You also need to specify the length of the string so

import numpy as np

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.str_,4),('month',np.str_,2),('day',np.str_,2),\
('hour',np.str_,2),('min',np.str_,2),('sec',np.str_,3)]

out=np.fromregex('filename',time_pattern,t_dtype)
print(out)

如果您查看此的第二个示例，它显示了如何处理字符串

If you look at the second example of this, it shows how to handle strings

这篇关于np.fromregex，字符串为dtype的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！