问题描述
可能的重复:
Python split() 不删除分隔符
我希望按如下方式拆分字符串:
text = " T?e qu!ck 'brown 1 fox! jumps-.ver. the 'lazy' doG?!"结果 ->(T?e qu!ck 'brown 1 fox!"、jumps-.ver."、the 'lazy' doG?"、!")
所以基本上我想在 ". "
, "! "
或 "? "
处拆分,但我想要拆分处的空格要删除的点,但不是点、逗号或问号.
我怎样才能有效地做到这一点?
str split 函数只接受分隔符.我想知道在构建所需结果时拆分所有空格然后找到以点,逗号或问号结尾的最佳解决方案.
您可以使用正则表达式拆分来实现:
>>>进口重新>>>text = "T?e qu!ck 'brown 1 fox! jumps-.ver. the 'lazy' doG?!">>>re.split('(?<=[.!?]) +',text)[" T?e qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']正则表达式 '(?<=[.!?]) +'
表示匹配一个或多个空格的序列 (' +'
) 仅当前面有一个 ., !或者 ?字符 ('(?<=[.!?])'
).
I wish to split a string as follows:
text = " T?e qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG? !"
result -> (" T?e qu!ck ' brown 1 fox!", "jumps-.ver.", "the 'lazy' doG?", "!")
So basically I want to split at ". "
, "! "
or "? "
but I want the spaces at the split points to be removed but not the dot, comma or question-mark.
How can I do this in an efficient way?
The str split function takes only on separator. I wonder is the best solution to split on all spaces and then find those that end with dot, comma or question-mark when constructing the required result.
You can achieve this using a regular expression split:
>>> import re
>>> text = " T?e qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG? !"
>>> re.split('(?<=[.!?]) +',text)
[" T?e qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']
The regular expression '(?<=[.!?]) +'
means match a sequence of one or more spaces (' +'
) only if preceded by a ., ! or ? character ('(?<=[.!?])'
).
这篇关于在 " 上拆分字符串.「、」!"或“?"保留标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!