问题描述
我需要在 Python2 中使用正则表达式来匹配水平空白而不是换行符.
\s 匹配所有空格,包括换行符.
>>>re.sub(r"\s", "", "line 1.\nline 2\n")'line1.line2'\h 根本不起作用.
>>>re.sub(r"\h", "", "line 1.\nline 2\n")'第 1 行.\n第 2 行\n'[\t ] 有效,但我不确定是否遗漏了其他可能的空白字符,尤其是在 Unicode 中.如\u00A0(非中断空格)或\u200A(头发空格).以下链接中有更多空白字符.https://www.cs.tut.fi/~jkorpela/chars/spaces.html
>>>re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\xa0\u200a\n'您有什么建议吗?
我最终使用了 [^\S\n] 而不是指定所有 Unicode 空格.
>>>re.sub(r"[^\S\n]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\n'>>>re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)u'line1.\nline2\n\xa0\u200a\n'它按预期工作.
I need a regex in Python2 to match only horizontal white spaces not newlines.
\s matches all whitespaces including newlines.
>>> re.sub(r"\s", "", "line 1.\nline 2\n")
'line1.line2'
\h does not work at all.
>>> re.sub(r"\h", "", "line 1.\nline 2\n")
'line 1.\nline 2\n'
[\t ] works but I am not sure if I am missing other possible white space characters especially in Unicode. Such as \u00A0 (non breaking space) or \u200A (hair space). There are much more white space characters at the following link. https://www.cs.tut.fi/~jkorpela/chars/spaces.html
>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'
Do you have any suggestions?
I ended up using [^\S\n] instead of specifying all Unicode white spaces.
>>> re.sub(r"[^\S\n]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\n'
>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'
It works as expected.
这篇关于正则表达式匹配水平空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!