本文介绍了python re(regex)是否可以替代\ u unicode转义序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python将\ uxxxx视为字符串文字内的Unicode字符转义符(例如u"\ u2014"被解释为Unicode字符U + 2014).但是我刚刚发现(Python 2.7)标准正则表达式模块不会将\ uxxxx视为Unicode字符.示例:

Python treats \uxxxx as a unicode character escape inside a string literal (e.g. u"\u2014" gets interpreted as Unicode character U+2014). But I just discovered (Python 2.7) that standard regex module doesn't treat \uxxxx as a unicode character. Example:

codepoint = 2014 # Say I got this dynamically from somewhere

test = u"This string ends with \u2014"
pattern = r"\u%s$" % codepoint
assert(pattern[-5:] == "2014$") # Ends with an escape sequence for U+2014
assert(re.search(pattern, test) != None) # Failure -- No match (bad)
assert(re.search(pattern, "u2014")!= None) # Success -- This matches (bad)

很明显,如果您能够将正则表达式模式指定为字符串文字,那么您可以起到与正则表达式引擎本身理解\ uxxxx转义相同的作用:

Obviously if you are able to specify your regex pattern as a string literal, then you can have the same effect as if the regex engine itself understood \uxxxx escapes:

test = u"This string ends with \u2014"
pattern = u"\u2014$"
assert(pattern[:-1] == u"\u2014") # Ends with actual unicode char U+2014
assert(re.search(pattern, test) != None)

但是,如果您需要动态构建模式怎么办?

But what if you need to construct your pattern dynamically?

推荐答案

使用 unichr()函数从代码点创建Unicode字符:

Use the unichr() function to create unicode characters from a codepoint:

pattern = u"%s$" % unichr(codepoint)

这篇关于python re(regex)是否可以替代\ u unicode转义序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:11