DOM中的超链接查找并替换关键字

DOM中的超链接查找并替换关键字

本文介绍了通过PHP DOM中的超链接查找并替换关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用 php类创建一个查找和替换函数,该函数会查找关键字并将其替换为关键字定义的链接,并将关键字作为链接文本。



如何使用< code>< div><>字符串中的< a href =info.php?tag = dexia> Dexia< / a> < / p>< / div> ?

解决方案



  $ html =<< p> < HTML 
< div>< p> Dexia银行的首席执行官< em>拥有< / em>只是决定退休。< / p>< / div>
HTML;

我已经添加了一个重点元素来说明它也适用于内联元素。



设置

  $ dom = new DOMDocument; 
$ dom-> formatOutput = TRUE;
$ dom-> loadXML($ html);
$ xpath = new DOMXPath($ dom);
$ nodes = $ xpath-> query('// text()[contains(。,Dexia)]');

上面有趣的事情当然是XPath。它查询加载的DOM中包含针Dexia的所有 DOMText 节点。结果是 DOMNodeList (和往常一样)。



替换

  foreach($ nodes为$ node){
$ link ='< a href =info.php?tag = dexia >达亚< / A>';
$ replacement = str_replace('Dexia',$ link,$ node-> wholeText);
$ newNode = $ dom-> createDocumentFragment();
$ newNode-> appendXML($ replaced);
$ node-> parentNode-> replaceChild($ newNode,$ node);
}
echo $ dom-> saveXML($ dom-> documentElement);

找到的 $ node 将包含字符串对于 wholeText ,Dexia银行的首席执行官尽管位于 P 元素内。这是因为 $ node 具有兄弟 DOMElement ,强调在 bank 之后。我将链接创建为字符串而不是节点,并在整个文本与它。然后,我从结果字符串中创建一个 DocumentFragment ,并用它替换 DOMText 节点。

W3C vs PHP 使用是非标准的方法,因为该方法不是W3C DOM规范的一部分。

如果您想要使用标准API进行替换,您首先必须创建 A 元素作为新的 DOMElement 。然后,您必须在 DOMText nodeValue 中找到Dexia的偏移量,并将 DOMText 在该位置节点到两个节点。从返回的兄弟姐妹中移除Dexia,并在第二个之前插入Link元素。对兄弟节点重复此过程,直到节点中找不到更多的Dexia字符串。下面是如何做到这一点的Dexia:

  foreach($ nodes as $ node){
$ link = $ dom-> createElement('a','Dexia');
$ link-> setAttribute('href','info.php?tag = dexia');
$ offset = strpos($ node-> nodeValue,'Dexia');
$ newNode = $ node-> splitText($ offset);
$ newNode-> deleteData(0,strlen('Dexia'));
$ node-> parentNode-> insertBefore($ link,$ newNode);
}

最后输出

 < div> 
< p>< a href =info.php?tag = dexia> Dexia的首席执行官< / a>银行< em>有< / em>只是决定退休。< / p>
< / div>


I'm trying to use the simple_html_dom php class to create a find and replace function that looks for keywords and replace them by a link to a definition of the keyword, with the keyword as link text.

How can i find and replace "Dexia" with <a href="info.php?tag=dexia">Dexia</a> using this class, inside a string such as <div><p>The CEO of the Dexia bank has just decided to retire.</p></div> ?

解决方案

That's somewhat tricky, but you could do it this way:

$html = <<< HTML
<div><p>The CEO of the Dexia bank <em>has</em> just decided to retire.</p></div>
HTML;

I've added an emphasis element just to illustrate that it works with inline elements too.

Setup

$dom = new DOMDocument;
$dom->formatOutput = TRUE;
$dom->loadXML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//text()[contains(., "Dexia")]');

The interesting thing above is the XPath of course. It queries the loaded DOM for all DOMText nodes containing the needle "Dexia". The result is DOMNodeList (as usual).

The replacement

foreach($nodes as $node) {
    $link     = '<a href="info.php?tag=dexia">Dexia</a>';
    $replaced = str_replace('Dexia', $link, $node->wholeText);
    $newNode  = $dom->createDocumentFragment();
    $newNode->appendXML($replaced);
    $node->parentNode->replaceChild($newNode, $node);
}
echo $dom->saveXML($dom->documentElement);

The found $node will contain the string The CEO of the Dexia bank for wholeText, despite it being inside the P element. That is because the $node has a sibling DOMElement with the emphasis after bank. I am creating the link as a string instead of a node and replace all occurences of "Dexia" (regardless of word boundary - that would be a good call for Regex) in the wholeText with it. Then I create a DocumentFragment from the resulting string and replace the DOMText node with it.

W3C vs PHP

Using DocumentFragement::applyXML() is a non-standard approach, because the method is not part of the W3C DOM Specs.

If you would want to do the replacement with the standard API, you'd first have to create the A Element as a new DOMElement. Then you'd have to find the offset of "Dexia" in the nodeValue of the DOMText and split the DOMText Node into two nodes at that position. Remove Dexia from the returned sibling and insert the Link Element, before the second one. Repeat this procedure with the sibling node until no more Dexia strings are found in the node. Here is how to do it for one occurence of Dexia:

foreach($nodes as $node) {
    $link = $dom->createElement('a', 'Dexia');
    $link->setAttribute('href', 'info.php?tag=dexia');
    $offset  = strpos($node->nodeValue, 'Dexia');
    $newNode = $node->splitText($offset);
    $newNode->deleteData(0, strlen('Dexia'));
    $node->parentNode->insertBefore($link, $newNode);
}

And finally the output

<div>
  <p>The CEO of the <a href="info.php?tag=dexia">Dexia</a> bank <em>has</em> just decided to retire.</p>
</div>

这篇关于通过PHP DOM中的超链接查找并替换关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 08:15