为什么正则表达式IsMatch

为什么正则表达式IsMatch

本文介绍了为什么正则表达式IsMatch()挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个exepression验证的电子邮件地址:

I have an exepression to validate an email address:

string REGEX_EMAIL = @"^\w+([\.\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\¦\}\~]*\w+)*@\w+([\.\-]\w+)*\.\w+([\.\-]\w+)*$";

如果地址是正确的IsMatch()方法可以迅速显示真实结果。但是,如果地址字符串是长错了这个方法挂起。

If address is correct IsMatch() method quickly shows true result. But if address string is long and wrong this method hangs.

我能做些什么来提高该方法的速度是多少?

What can I do to raise speed of this method?

感谢。

推荐答案

您必须要在其上伤害到这个正则表达式的表现一对夫妇的事情。

You have a couple things going on which are hurting the performance in this regular expression.




  1. Catastrophic backtracking
  2. Too many optional statements

您绝对可以通过在使用 + 代替了 * 提高性能几个关键的地方,但是这当然改变是什么正则表达式将会与将不匹配。于是,我找到了最简单的解决实际上是涵盖了灾难性的回溯文章上面。您可以使用nonbacktracking子表达式将显着提高在这种情况下的性能,在不改变该事项的任何方式正则表达式的行为。

You can definitely improve performance by using the + instead of the * in a few key places, but this of course changes what the regular expression will and won't match. So the easiest fix I found is actually covered in the catastrophic backtracking article above. You can use the nonbacktracking subexpression to drastically improve performance in this case, without changing the regular expression's behavior in any way that matters.

nonbacktracking子表达式看起来像这样... (?>的模式)

The nonbacktracking subexpression looks like this... (?>pattern)

所以,试试这个正则表达式来代替:

So try this regular expression instead:

^\w+(?>[\.\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\¦\}\~]*\w+)*@\w+([\.\-]\w+)*\.\w+([\.\-]\w+)*$






在一个稍微有关的话题,我的用于检查一个有效的电子邮件地址的理念是有点不同。首先,像这样一个长的正则表达式都可能有性能问题,因为你已经找到。


On a slightly related topic, my philosophy for checking for a valid email address is a bit different. For one, long regular expressions like this one can potentially have performance problems as you've found.

其次,还有即将到来的电子邮件地址国际化的承诺,这更加复杂化了这一切。

Secondly, there's the upcoming promise of email address internationalization which complicates all of this even more.

最后,任何基于正则表达式电子邮件验证的主要目的是捕捉拼写错误和公然试图通过您的形式来获得,而无需输入一个真实的电子邮件地址。但是,检查电子邮件地址是真实的,您需要发送电子邮件至该地址。

Lastly, the main purpose of any regular expression based email validation is to catch typos and blatant attempts to get through your form without entering a real email address. But to check if an email address is genuine requires that you send an email to that address.

所以,我的理念是宁可接受太多的一面。而这,其实是一件很简单的事...

So my philosophy is to err on the side of accepting too much. And that, in fact, is a very simple thing to do...

^.+@.+\..+$

这应匹配任何可以想象有效的电子邮件地址,以及一些无效的以及。

This should match any conceivably valid email address, and some invalid ones as well.

这篇关于为什么正则表达式IsMatch()挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 01:18