问题描述
我试图取代包含在HTML code块中的用户发布了一个老web-app的适当锚内的URL(< A>
)对于这些网址。
I'm trying to replace Urls contained inside a HTML code block the users post into an old web-app with proper anchors (<A>
) for those Urls.
问题是,网址可已经抛锚,包含在&LT; A&GT;
元素。这些网址不应该被取代。
The problem is that Urls can be already 'anchored', that is contained in <A>
elements. Those Url should not be replaced.
例如:
<a href="http://noreplace.com">http://noreplace.com</a> <- do not replace
<a href="http://noreplace.com"><u>http://noreplace.com</u></a> <- do not replace
<a href="...">...</a>http://replace.com <- replace
你会正则表达式匹配只有'不固定网址是什么样子?
What would the regex to match only 'not anchored Urls' look like?
我用下面的函数与正则表达式来代替:
I use the following function to replace with RegEx:
Function ReplaceRegExp(strString, strPattern, strReplace)
Dim RE: Set RE = New RegExp
With RE
.Pattern = strPattern
.IgnoreCase = True
.Global = True
ReplaceRegExp = .Replace(strString, strReplace)
End With
End Function
以下非贪婪正则表达式用于格式化UBB的URL。可这正则表达式进行调整,以匹配只有我需要的人?
The following non greedy regex is used to format UBB URLs. Can this regex be adapted to match only the ones I need?
' the double doublequote in the brackets is because
' double doublequoting is ASP escaping for doublequotes
strString = ReplaceRegExp(strString, "\[URL=[""]?(http|ftp|https)(:\/\/[\w\-_]+)((\.[\w\-_]+)+)([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?[""]?\](.*?)\[/URL\]", "<a href=""$1$2$3$5"" target=""_blank"">$6</a>")
如果这真的无法用正则表达式完成,这将是在ASP经典的解决方案,具有一定的code或伪code吗?不过,我真的想保持code简单的一个额外的正则表达式的行不是添加额外的功能,这个老code。
If this really cannot be done with RegEx, what would be the solution in ASP Classic, with some code or pseudocode please? However I would really like to keep code simple with an additional regex line than add additional functions to this old code.
感谢你的努力!
推荐答案
您正在寻找的答案是消极和积极的外观aheads看看屁股
The answer you're looking for is in negative and positive look aheads and look behinds
本文给出了一个pretty很好的概述:
This article gives a pretty good overview: http://www.regular-expressions.info/lookaround.html
下面是经常防爆pression我已经制定了您的情况:
Here's the Regular Expression I've formulated for your case:
(?<!"|>)(ht|f)tps?://.*?(?=\s|$)
下面是我对匹配的一些示例数据:
Here's some sample data I matched against:
#Matches
http://www.website.com
https://www.website.com
This is a link http://www.website.com that is not linked
This is a long link http://www.website.com/index.htm?foo=bar
ftp://www.website.com
#No Matches
<u>http://www.website.com</u>
<a href="http://www.website.com">http://website.com</a>
<a href="https://www.website.com">http://website.com</a>
<a href="http://www.website.com"><u>http://www.website.com</u></a>
<a href="ftp://www.website.com">ftp://www.website.com</a>
下面是一个什么样的常规前pression是做了细分:
Here's a breakdown of what the regular expression is doing:
(小于?!|&GT;)
负身后看,并确保何去何从匹配是不是pceded $ P $ a或>
(?<!"|>)
A negative look behind, making sure what matches next isn't preceded by a " or >
(HT | F)TPS:?//.*
这看起来对HTTP,HTTPS和FTP和任何跟随它。它也将匹配FTPS!如果你想避免这种情况,你可以使用(HTTPS | FTP?)://.*
而不是
(= \\ s | $?)
这是一个超前正面看,它匹配一个空格或行尾。
(?=\s|$)
This is a positive look ahead, which matches a space or end of line.
加分
(HT)?((1)TPS | FTP?)://
这将匹配HTTP / HTTPS / FTP而不是FTPS,这可能是一个有点矫枉过正时,你可以使用(HTTPS | FTP)://
,但它是一个真棒例子如果正则表达式/人。
(ht)?(?(1)tps?|ftp)://
This will match http/https/ftp but not ftps, this may be a bit overkill when you can use (https?|ftp)://
but it's an awesome example of if/else in regex.
这篇关于正则表达式:更换所有网址-S未固定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!