正则表达式将 URL 转换为链接

本文介绍了正则表达式将 URL 转换为链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从这个网站借用"了一个正则表达式:http://daringfireball.net/2010/07/modified_regex_for_matching_urls 几乎完成，但我想匹配 exemple.com
我知道 stackoverflow 不是 doyourhomework.com 但我经过了很长时间的思考却没有结果.这是一个要测试的小提琴:http://jsfiddle.net/BGnMm/25/ 你可以看到最后，example.com 不是链接.

I 'borrowed' a regex from this website : http://daringfireball.net/2010/07/improved_regex_for_matching_urls that is almost complete but i want to match exemple.com
I know that stackoverflow is not doyourhomework.com but I passed a long time thinking without results. Here is a fiddle to test : http://jsfiddle.net/BGnMm/25/ and you can see at the end that exemple.com is not a link.

var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»""‘’]))/gi;
var allurl="http:foo.com/blah_blah http://foo.com/blah_blah/ (Something like http://foo.com/blah_blah) http://foo.com/blah_blah_(wikipedia) http://foo.com/more_(than)_one_(parens) (Something like http://foo.com/blah_blah_(wikipedia)) http://foo.com/blah_(wikipedia)#cite-1 http://foo.com/blah_(wikipedia)_blah#cite-1 http://foo.com/unicode_(✪)_in_parens http://foo.com/(something)?after=parens http://foo.com/blah_blah. http://foo.com/blah_blah/. <http://foo.com/blah_blah> <http://foo.com/blah_blah/> http://foo.com/blah_blah, http://www.extinguishedscholar.com/wpglob/?p=364. http://✪df.ws/1234 rdar://1234 rdar:/1234 x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E message://%[email protected]%3e http://➡.ws/䨹 www.c.ws/䨹 <tag>http://example.com</tag> Just a www.example.com link. http://example.com/something?with,commas,in,url, but not at end What about <mailto:[email protected]?subject=TEST> (including brokets). mailto:[email protected] bit.ly/foo "is.gd/foo/" WWW.EXAMPLE.COM http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752 http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55)) http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:@field(NUMBER+@band(thc+5a46634)) 6:00p filename.txt http://example.com/quotes-are-"part" ✪df.ws/1234 example.com example.com/";
document.write(allurl.replace(reg,"<a href='$1' >$1</a><br />"));

推荐答案

在 {2,4}\/ 后面添加一个交替操作符 (|)，即

Add an alternation operator (|) after the {2,4}\/, i.e.

    var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/|)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»""‘’]))/gi;

关于这一点，您应该了解一些事情.第一个未捕获的组 (?: ... ) 查找 URL 的指示符".例如，一种指示符是 www(后跟最多 3 位数字).但是，您需要一种根本没有任何指示符来识别 URL 的方法.因此，我们在上面所做的是添加了一个子句或空匹配"作为有效"指示符.这样做的结果是，您的正则表达式现在的选择性降低了:各种字符串，不仅是，还有、和被标识为 URL！你唯一的其他(现成的)选项是对后缀更有选择性，例如需要特定的后缀 (com|org|net)，但这会背离原始正则表达式的一般性，后者根本不指定任何后缀.

There's something you should understand about this. The first non-captured group, (?: … ), looks for "indicators" of URLs. One indicator, for example, is the www (followed by up to 3 digits of numbers). You however are asking for a way to identify URLs without any indicator at all. So, what we've done above is we've added a clause, "or an empty match," as a "valid" indicator. The consequence of this is that your regular expression is less selective now: all sorts of strings, not only but also , , and are identified as URLs! Your only other (readily available) option is to be more selective about suffixes, e.g. require specific suffixes (com|org|net), but then this takes away from the generality of the original regex, which doesn't specify any suffixes at all.

换句话说，您可能面临的是逻辑限制，而不是正则表达式编写技巧或正则表达式语言本身的限制.

In other words, you are probably faced with a limitation of logic, not a limitation of regex-writing skills or the regex language itself.

这篇关于正则表达式将 URL 转换为链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！