getElementByID添加代替空

getElementByID添加代替空

本文介绍了PHP DOMDocument-> getElementByID添加代替空< span>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PHP的DOMDocument对象来解析一些HTML(用cURL提取)。当我通过ID获取元素并输出它时,任何空的< span> < / span> 标签获得额外的字符,并成为< span>Â< / span>

I'm using PHP's DOMDocument object to parse some HTML (fetched with cURL). When I get an element by ID and output it, any empty <span> </span> tags get an additional character and become <span>Â </span>.

代码:

<?php
    $document = new DOMDocument();
    $document->validateOnParse = true;

    $document->loadHTML( curl_exec($handle) );
    curl_close($handle);

    $element = $document->getElementById( __ELEMENT_ID__ );

    echo $document->saveHTML();
    echo $document->saveHTML($element);
?>

$ document-> saveHTML()命令按预期行为,打印出整个页面。但是,就像我上面说的那样,在$ $ c $ echo $ document-> saveHTML($ element)命令转换空< span> 标签到< span>Â< / span>

The $document->saveHTML() command behaves as expected and prints out the entire page. BUT, like I say above, on the echo $document->saveHTML($element) command transforms empty <span> tags into <span>Â </span>.

C $ C><跨度> < / span> $ element 中的标签

This happens to all <span> </span> tags within $element.

这个过程(通过ID获取元素并输出元素)插入这个额外的字符?我可以解决它,但是我更有兴趣到达根。

What in this process (of getting the element by ID and outputting the element) is inserting this extra character? I'm could work around it, but I'm more interested in getting to the root.

推荐答案

我可以通过设置页面的字符编码来解决问题。我提取的页面没有定义的字符编码,我的页面只是一个没有定义标题信息的代码段。当我添加

I was able to fix the problem by setting the character encoding of the page. The page I was fetching did not have a defined character encoding, and my page was just a snippet without defined header info. When I added

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>

问题消失了。

这篇关于PHP DOMDocument-&gt; getElementByID添加代替空&lt; span&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 16:16