问题描述
Ruby regex 和 Python regex 之间有什么真正的区别吗?
Are there any real differences between Ruby regex and Python regex?
我一直找不到两者之间的任何差异,但可能遗漏了一些东西.
I've been unable to find any differences in the two, but may have missed something.
推荐答案
我上次检查时,它们在 Unicode 支持方面存在很大差异.1.9 中的 Ruby 至少有一些非常有限的 Unicode 支持.我相信现在可能支持一两个 Unicode 属性.可能是一般类别和脚本是我正在考虑的两个.
The last time I checked, they differed substantially in their Unicode support. Ruby in 1.9 at least has some very limited Unicode support. I believe one or two Unicode properties might be supported by now. Probably the general categories and maybe the scripts were the two I'm thinking of.
同时,Python 对 Unicode 的支持越来越少.Python 似乎确实可以满足 RL1.2a兼容性属性"的要求来自 UTS#18 关于 Unicode 正则表达式.
Python has less and more Unicode support at the same time. Python does seem to make it possible to meet the requirements of RL1.2a "Compatability Properties" from UTS#18 on Unicode Regular Expressions.
也就是说,Matthew Barnett 有一个非常不错的 Python 库(mrab) 最终为 Python 正则表达式添加了几个 Unicode 属性.他支持两个最重要的:通用类别和脚本属性.它还有一些其他有趣的功能.值得好好宣传一下.
That said, there is a really rather nice Python library out there by Matthew Barnett (mrab) that finally adds a couple of Unicode properties to Python regexes. He supports the two most important ones: the general categories, and the script properties. It has some other intriguing features as well. It deserves some good publicity.
我不认为 Ruby 或 Python 都非常好地支持 Unicode,尽管每天都在做越来越多的事情.然而,特别是,它们甚至都不符合上面引用的 Unicode 正则表达式的准系统 1 级要求.例如,RL1.2 要求至少支持 11 个属性:General_Category、Script、Alphabetic、Uppercase、Lowercase、White_Space、Noncharacter_Code_Point、Default_Ignorable_Code_Point、ANY、ASCII、
和 ASSIGNED
.
I don't think either of Ruby or Python support Unicode all that terribly well, although more and more gets done every day. In particular, however, neither meets even the barebones Level 1 requirement for Unicode Regular Expressions cited above. For example, RL1.2 requires that at least 11 properties be supported: General_Category, Script, Alphabetic, Uppercase, Lowercase, White_Space, Noncharacter_Code_Point, Default_Ignorable_Code_Point, ANY, ASCII,
and ASSIGNED
.
我认为 Python 只能让您以一种迂回的方式获得其中的一些.当然,除了这 11 个之外,还有很多很多其他的属性.
I think Python only lets you get to some of those, and only in a roundabout way. Of course, there are many, many other properties beyond these 11.
当您在寻找 Unicode 支持时,当然不仅仅是 UTS#10 关于正则表达式,尽管这是对这个问题最重要的一个,而且 Ruby 和 Puython 都不符合 1 级标准.Unicode 的其他非常重要的方面包括 UAX#15、UAX#14、UTS#18、UAX#11、UAX#29,当然还有关键的 UAX#44.我知道,Python 至少有几个库.我不知道它们是标准的.
When you’re looking for Unicode support, there's more than just UTS#10 on Regular Expressions of course, although that is the one that matters most to this question and neither Ruby nor Puython are Level 1 compliant. Other very important aspects of Unicode include UAX#15, UAX#14, UTS#18, UAX#11, UAX#29, and of course the crucial UAX#44. Python has libraries for at least a couple of those, I know. I don't know that they're standard.
但是,当谈到正则表达式支持时,嗯, 有比这两种更丰富的选择,你知道.:)
But when it comes to regular expression support, um, there are richer alternatives than just those two, you know. :)
这篇关于Ruby 正则表达式与 Python 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!