问题描述
我有一个文件,其日期格式为"1:*?year mo da ho mi se.condsdec"(?"是1个字符的通配符),即:
I have a file with dates formatted as "1:*? year mo da ho mi se.condsdec", (with "?" being a 1 character wildcard) ie:
*A 2014 12 31 23 59 59.123456
我想将其提取为字符串(最终转换为日期时间字符串).
I would like to extract this either as strings (to eventually be converted to datetime strings).
我可以使用正则表达式模式将日期提取为一组int
/floats
:
I am able to extract the date as a set of int
/floats
using the regex pattern:
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
,但不能作为字符串.如何使用字符串使它工作?
but not as a string. How do I get this to work using a string?
我正在将python 3.4.3与numpy 1.9.3一起使用.
I am using python 3.4.3 with numpy 1.9.3.
import numpy as np
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.int16),('month',np.int8),('day',np.int8),\
('hour',np.int8),('min',np.int8),('sec',np.float64)]
out=np.fromregex('filename',time_pattern,t_dtype)
print(out)
#returns [(2013, 11, 26, 0, 0, 10.0) (2013, 11, 26, 0, 0, 20.0)
# (2013, 11, 26, 0, 0, 30.0)]
basic_t=r'$\*.{2}(.{28})'
t_dtype=[('date',str)]
out=np.fromregex('filename',basic_t,t_dtype)
#causes TypeError:
#TypeError: Empty data-type
使用文件filename
:
* 2003 11 26 00 00 10.00000000
some text or interesting data
* 2003 11 26 00 00 20.00000000
more text
even more text
* 2003 11 26 00 00 30.00000000
etc.
请注意,模式是简单的
with open(file) as f:
for line in f:
m=re.search(basic_t,line)
但是我想将输出作为一个numpy数组,并希望将运行时保持在最低限度.
But I would like to have the output as a numpy array, and would like to keep runtime to a minimum.
编辑将dtype更改为'S'
或np.str
可以消除错误,但是我仍然得到一个空列表作为输出
EditChanging dtype to 'S'
or np.str
removes the error, but I still get an empty list as output
推荐答案
您的问题是,当您应将dtype指定为np.str_
时,将dtype设置为int或float.您还需要指定字符串的长度,这样
Your problem is you are setting the dtype as int or float when you should be specifying them as np.str_
. You also need to specify the length of the string so
import numpy as np
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.str_,4),('month',np.str_,2),('day',np.str_,2),\
('hour',np.str_,2),('min',np.str_,2),('sec',np.str_,3)]
out=np.fromregex('filename',time_pattern,t_dtype)
print(out)
如果您查看此的第二个示例,它显示了如何处理字符串
If you look at the second example of this, it shows how to handle strings
这篇关于np.fromregex,字符串为dtype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!