如何替换文本 URL 并排除 HTML 标签中的 URL?

本文介绍了如何替换文本 URL 并排除 HTML 标签中的 URL?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要你帮忙.

我想转这个:

sometext sometext http://www.somedomain.com/index.html sometext sometext

进入:

sometext sometext  使用 Regex 模式来选择我们的节点，而不是三个调用 contains.
我们将使用  代替以符合标准的方式拆分文本节点文档片段 并用片段替换整个文本节点.在这种情况下，非标准仅意味着 我们将为此使用的方法，不是 W3C DOM API 规范的一部分.
foreach ($texts as $text) {$fragment = $dom->createDocumentFragment();$fragment->appendXML(预替换("~((?:http|https|ftp)://(?:S*?.S*?))(?=s|;|)|]|[|{|}|,|"|'|:|<|$|.s)~i",'<a href="$1">$1</a>',$text->data));$text->parentNode->replaceChild($fragment, $text);}echo $dom->saveXML($dom->documentElement);
然后输出:
<p>这是一个带有 <a href="http://example.com/1">link</a> 的文本.和另一个 <a href="http://example.com/2">http://example.com/2</a>还有另一个<a href="http://example.com">http://example.com</a>后者是只有一个应该更换.这里面也有图片文本，例如 <img src="http://example.com/foo"/>但这些应该也不会被取代.实际上，只有文本中没有的 URL锚元素的后代应该转换为链接.</p></body></html>
I need you help here.
I want to turn this:
sometext sometext http://www.somedomain.com/index.html sometext sometext
into:
sometext sometext <a href="http://somedoamai.com/index.html">www.somedomain.com/index.html</a> sometext sometext
I have managed it by using this regex:
preg_replace("#((http|https|ftp)://(S*?.S*?))(s|;|)|]|[|{|}|,|"|'|:|<|$|.s)#ie", "'<a href="$1" target="_blank">$1</a>$4'", $text);
The problem is it’s also replacing the img URL, for example:
sometext sometext <img src="http//domain.com/image.jpg"> sometext sometext
is turned into:
sometext sometext <img src="<a href="http//domain.com/image.jpg">domain.com/image.jpg</a>"> sometext sometext
Please help.
 解决方案 
Streamlined version of Gumbo's above:
$html = <<< HTML
<html>
<body>
<p>
    This is a text with a <a href="http://example.com/1">link</a>
    and another <a href="http://example.com/2">http://example.com/2</a>
    and also another http://example.com with the latter being the
    only one that should be replaced. There is also images in this
    text, like <img src="http://example.com/foo"/> but these should
    not be replaced either. In fact, only URLs in text that is no
    a descendant of an anchor element should be converted to a link.
</p>
</body>
</html>
HTML;
Let's use an XPath that only fetches those elements that actually are textnodes containing http:// or https:// or ftp:// and that are not themselves textnodes of anchor elements. 
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$texts = $xPath->query(
    '/html/body//text()[
        not(ancestor::a) and (
        contains(.,"http://") or
        contains(.,"https://") or
        contains(.,"ftp://") )]'
);
The XPath above will give us a TextNode with the following data:
 and also another http://example.com with the latter being the
    only one that should be replaced. There is also images in this
    text, like
Since PHP5.3 we could also use PHP inside the XPath to use the Regex pattern to select our nodes instead of the three calls to contains. 
Instead of splitting the textnodes apart in the standards compliant way, we will use a document fragment and just replace the entire textnode with the fragment. Non-standard in this case only means, the method we will be using for this, is not part of the W3C specification of the DOM API.
foreach ($texts as $text) {
    $fragment = $dom->createDocumentFragment();
    $fragment->appendXML(
        preg_replace(
            "~((?:http|https|ftp)://(?:S*?.S*?))(?=s|;|)|]|[|{|}|,|"|'|:|<|$|.s)~i",
            '<a href="$1">$1</a>',
            $text->data
        )
    );
    $text->parentNode->replaceChild($fragment, $text);
}
echo $dom->saveXML($dom->documentElement);
and this will then output:
<html><body>
<p>
    This is a text with a <a href="http://example.com/1">link</a>
    and another <a href="http://example.com/2">http://example.com/2</a>
    and also another <a href="http://example.com">http://example.com</a> with the latter being the
    only one that should be replaced. There is also images in this
    text, like <img src="http://example.com/foo"/> but these should
    not be replaced either. In fact, only URLs in text that is no
    a descendant of an anchor element should be converted to a link.
</p>
</body></html>
                        
这篇关于如何替换文本 URL 并排除 HTML 标签中的 URL?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！