s/(?P<head>\[\[foo[^\[]*)abc/\g<head>def
s/(?=\[\[foo[^\[]*)abc/def
哪个更有效率?有没有其他方法可以提高效率?请注意,虽然我使用 Perl 风格的语法进行说明,但实际上我使用的是 Python 的
re
库,它不允许使用 \K
(keep) 关键字。 最佳答案
在 python 中使用 (?P<head>\[\[foo[^\[]*)abc
模块的 re
速度更快:
import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
t1 = time.time()
if ver is 1:
rec1.sub("def", x)
else:
rec2.sub("def", x)
return (time.time() - t1)
for x in xrange(50000):
total1 += timeRE(1)
for x in xrange(50000):
total2 += timeRE(2)
print total1
print total2
输出:
4.27380466461
16.9591507912
编辑(在同一个循环中多次执行两个调用):
for x in xrange(50000):
total1 += timeRE(1)
total2 += timeRE(2)
输出:
4.26199269295
17.2384319305
编辑(修复子匹配问题):
import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
t1 = time.time()
if ver is 1:
rec1.sub("\g<head>def", x)
else:
rec2.sub("def", x)
return (time.time() - t1)
for x in xrange(50000):
total1 += timeRE(1)
total2 += timeRE(2)
print total1
print total2
输出:
Run 1:
4.62282061577
17.8212277889
Run 2:
4.6660721302
17.1630160809
Run 3:
4.62124109268
17.21393013
编辑(使用将匹配 REGEX 的字符串):
import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
t1 = time.time()
if ver is 1:
rec1.sub("\g<head>def", x)
else:
rec2.sub("def", x)
return (time.time() - t1)
for x in xrange(50000):
total1 += timeRE(1)
total2 += timeRE(2)
print total1
print total2
输出:
23.4271130562
29.6934807301
最后一次运行:
import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
t1 = time.time()
if ver is 1:
rec1.sub("\g<head>def", x)
else:
rec2.sub("def", x)
return (time.time() - t1)
for x in xrange(50000):
total1 += timeRE(1)
total2 += timeRE(2)
print "Method 1: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total1 / 50000.0), total1)
print "Method 2: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total2 / 50000.0), total2)
输出:
Method 1: Avg run took: +0.0004924 - With a total of: +24.6196477
Method 2: Avg run took: +0.0005921 - With a total of: +29.6053855
关于python - 哪个正则表达式更有效?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/7423525/