本文介绍了需要一个好的正则表达式来将 URL 转换为链接,但不要理会现有的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量用户提交的内容.它是 HTML,并且可能包含 URL.其中一些将是 <a> 的(如果用户很好),但有时用户很懒惰,只需输入 www.something.com 或至多 http://www.something.com.

I have a load of user-submitted content. It is HTML, and may contain URLs. Some of them will be <a>'s already (if the user is good) but sometimes users are lazy and just type www.something.com or at best http://www.something.com.

我找不到合适的正则表达式来捕获 URL,但会忽略紧接在双引号或>"右侧的那些.有人有吗?

I can't find a decent regex to capture URLs but ignore ones that are immediately to the right of either a double quote or '>'. Anyone got one?

推荐答案

Jan Goyvaerts,RegexBuddy 的创建者,拥有写了回复给 Jeff Atwood 的解决 Jeff 遇到的问题并提供了很好的解决方案的博客.

Jan Goyvaerts, creator of RegexBuddy, has written a response to Jeff Atwood's blog that addresses the issues Jeff had and provides a nice solution.

(?:(?:https?|ftp|file)://|www.|ftp.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]

为了忽略紧邻 " 或 > 的匹配项,您可以将 (?<![">]) 添加到正则表达式的开头,这样您就可以得到

In order to ignore matches that occur right next to a " or >, you could add (?<![">]) to the start of the regex, so you get

(?<![">])(?:(?:https?|ftp|file)://|www.|ftp.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]

这将匹配完整地址(http://...)和以 www 开头的地址.或ftp.- 你对像ars.userfriendly.org这样的地址不走运...

This will match full addresses (http://...) and addresses that start with www. or ftp. - you're out of luck with addresses like ars.userfriendly.org...

这篇关于需要一个好的正则表达式来将 URL 转换为链接,但不要理会现有的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 07:14
查看更多