问题描述
我遇到了一个我希望很简单的问题,但是我在试图解决这个问题时遇到了障碍.我试图从文件中每一行的开头删除 DateTime 时间戳,但是返回的信息正在切断我想保留的一些字符.我相当确定我的正则表达式没问题,并且基于 regex.group() 输出,它看起来不错.我发现带有字母c"和e"的行似乎将它们的字符剪掉了,而其他行则按预期工作.
Python 2.7.6(默认,2015 年 6 月 22 日,17:58:13)
[GCC 4.8.2] 在 linux2 上
>>>进口重新>>>>>>line2 = '[2010 年 12 月 1 日星期三 10:24:24] ceeeeest'>>>a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line2, re.I)>>>一组()'[2010 年 12 月 1 日星期三 10:24:24]'>>>a.groups()('[星期三', '十二月', ' ', '01', ' ', '10', ':', '24', ':', '24', '2010', ']')>>>b = a.group()>>>乙'[2010 年 12 月 1 日星期三 10:24:24]'>>>c = line2.strip(b)>>>C'英石'>>>我希望 C 是ceeeeeest"
或
>>>line = '[Wed Dec 01 10:24:24 2010] testc'>>>a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line, re.I)>>>一组()'[2010 年 12 月 1 日星期三 10:24:24]'>>>a.groups()('[星期三', '十二月', ' ', '01', ' ', '10', ':', '24', ':', '24', '2010', ']')>>>b = a.group()>>>c = line.strip(b)>>>C'测试'>>>我希望 c 是testc"
我在这里遗漏了一些非常基本的东西吗?请赐教.谢谢.
str.strip
将删除参数中字符串开头和结尾的所有字符.您可能想要使用 str.replace
代替.
您可以使用 去掉前导空格str.lstrip
,或者使用 str.strip
如果你也想去掉尾随的空格(默认参数是空格).
I'm running into an issue that I hope is simple, however I've run into a wall trying to figure it out. I'm attempting to strip the DateTime timestamp from the beginning of each line in a file, however the returned information is cutting off some of the characters that I'd like to keep. I was fairly sure my regex is OK, and based on the regex.group() output, it looks good. I find that lines with the letters "c" and "e" seem to get their characters trimmed off, while other lines work as expected.
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
>>> import re
>>>
>>> line2 = '[Wed Dec 01 10:24:24 2010] ceeeeest'
>>> a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line2, re.I)
>>> a.group()
'[Wed Dec 01 10:24:24 2010]'
>>> a.groups()
('[Wed ', 'Dec', ' ', '01', ' ', '10', ':', '24', ':', '24', ' 2010', ']')
>>> b = a.group()
>>> b
'[Wed Dec 01 10:24:24 2010]'
>>> c = line2.strip(b)
>>> c
'st'
>>>
I expect C to be "ceeeeest"
OR
>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line, re.I)
>>> a.group()
'[Wed Dec 01 10:24:24 2010]'
>>> a.groups()
('[Wed ', 'Dec', ' ', '01', ' ', '10', ':', '24', ':', '24', ' 2010', ']')
>>> b = a.group()
>>> c = line.strip(b)
>>> c
'test'
>>>
I expect c to be "testc"
Is there something very basic I am missing here? Please enlighten me. Thank you.
The method str.strip
will remove all characters from the beginning and end of the string that are in the argument. You probably want to use str.replace
instead.
>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> line.replace('[Wed Dec 01 10:24:24 2010]', '')
' testc'
You can get rid of the leading white space by using str.lstrip
, or use str.strip
if you want to get rid of trailing white space too (the default arguments are white space).
这篇关于带有正则表达式过滤意外字符的 Python str.strip()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!