但只保留现有的链接

但只保留现有的链接

本文介绍了需要一个很好的正则表达式来将URL转换为链接,但只保留现有的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大堆用户提交的内容。它是HTML,可能包含URL。其中一些会是< a> 的(如果用户很好),但有时用户很懒,只需输入www.something.com或者最好。



我可以找到一个体面的正则表达式来捕获URL,但忽略那些立即在双引号或'>'右边的。

=noreferrer> RegexBuddy ,有,这个博客解决了Jeff所提出的问题,并提供了一个很好的解决方案。

  \b(?:(?:https?| ftp | file):// | www\。| ftp \。)[ -  A-Z0-9 +& @#/%=〜_ | $?!:,。] * [A-Z0-9 +& @#/%=〜_ | $] 

为了忽略紧挨着or>的匹配,可以添加(?<![>])到正则表达式的开头,所以你得到

  \b(小于; [>]?!)(:???(:HTTPS | FTP |文件):// | www\ | ftp\。)[-A-Z0 -9 +& @#/%=〜_ | $?!:,。] * [A-Z0-9 +& @#/%=〜_ | $] 

这将匹配完整地址(。)以及以www。或ftp开头的地址 - 你的地址不像ars.userfriendly.org ... $ / b>

I have a load of user-submitted content. It is HTML, and may contain URLs. Some of them will be <a>'s already (if the user is good) but sometimes users are lazy and just type www.something.com or at best http://www.something.com.

I can't find a decent regex to capture URLs but ignore ones that are immediately to the right of either a double quote or '>'. Anyone got one?

解决方案

Jan Goyvaerts, creator of RegexBuddy, has written a response to Jeff Atwood's blog that addresses the issues Jeff had and provides a nice solution.

\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]

In order to ignore matches that occur right next to a " or >, you could add (?<![">]) to the start of the regex, so you get

(?<![">])\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]

This will match full addresses (http://...) and addresses that start with www. or ftp. - you're out of luck with addresses like ars.userfriendly.org...

这篇关于需要一个很好的正则表达式来将URL转换为链接,但只保留现有的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 06:58