问题描述
我有一大堆用户提交的内容。它是HTML,可能包含URL。其中一些会是< a>
的(如果用户很好),但有时用户很懒,只需输入www.something.com或者最好。
我可以找到一个体面的正则表达式来捕获URL,但忽略那些立即在双引号或'>'右边的。
我有一大堆用户提交的内容。它是HTML,可能包含URL。其中一些会是< a>
的(如果用户很好),但有时用户很懒,只需输入www.something.com或者最好。
我可以找到一个体面的正则表达式来捕获URL,但忽略那些立即在双引号或'>'右边的。
\b(?:(?:https?| ftp | file):// | www\。| ftp \。)[ - A-Z0-9 +& @#/%=〜_ | $?!:,。] * [A-Z0-9 +& @#/%=〜_ | $]
为了忽略紧挨着or>的匹配,可以添加(?<![>])
到正则表达式的开头,所以你得到
\b(小于; [>]?!)(:???(:HTTPS | FTP |文件):// | www\ | ftp\。)[-A-Z0 -9 +& @#/%=〜_ | $?!:,。] * [A-Z0-9 +& @#/%=〜_ | $]
这将匹配完整地址(。)以及以www。或ftp开头的地址 - 你的地址不像ars.userfriendly.org ... $ / b>
I have a load of user-submitted content. It is HTML, and may contain URLs. Some of them will be <a>
's already (if the user is good) but sometimes users are lazy and just type www.something.com or at best http://www.something.com.
I can't find a decent regex to capture URLs but ignore ones that are immediately to the right of either a double quote or '>'. Anyone got one?
Jan Goyvaerts, creator of RegexBuddy, has written a response to Jeff Atwood's blog that addresses the issues Jeff had and provides a nice solution.
\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]
In order to ignore matches that occur right next to a " or >, you could add (?<![">])
to the start of the regex, so you get
(?<![">])\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]
This will match full addresses (http://...) and addresses that start with www. or ftp. - you're out of luck with addresses like ars.userfriendly.org...
这篇关于需要一个很好的正则表达式来将URL转换为链接,但只保留现有的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!