BeautifulSoup可以保留CDATA节吗?

本文介绍了BeautifulSoup可以保留CDATA节吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用BeautifulSoup读取，修改和写入XML文件.我在删除CDATA节方面遇到麻烦.这是一个简化的示例.

I'm using BeautifulSoup to read, modify, and write an XML file. I'm having trouble with CDATA sections being stripped out. Here's a simplified example.

罪魁祸首XML文件:

<?xml version="1.0" ?>
<foo>
    <bar><![CDATA[
        !@#$%^&*()_+{}|:"<>?,./;'[]\-=
    ]]></bar>
</foo>

这是Python脚本.

And here's the Python script.

from bs4 import BeautifulSoup

xmlfile = open("cdata.xml", "r")
soup = BeautifulSoup( xmlfile, "xml" )
print(soup)

这是输出.请注意，缺少CDATA部分标签.

Here's the output. Note the CDATA section tags are missing.

<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>
        !@#$%^&amp;*()_+{}|:"&lt;&gt;?,./;'[]\-=
    </bar>
</foo>

我还尝试打印soup.prettify(formatter="xml")，并且在空白处略有不同，但得到的结果相同.在文档中，关于读取CDATA部分的内容不多，所以也许这是lxml事情?

I also tried printing soup.prettify(formatter="xml") and got the same result with slightly different whitespace. There isn't much in the docs about reading in CDATA sections, so maybe this is an lxml thing?

有没有办法告诉BeautifulSoup保存CDATA节?

Is there a way to tell BeautifulSoup to preserve CDATA sections?

更新是的，这是lxml. http://lxml.de/api.html#cdata 因此，问题就变成了可以告诉BeautifulSoup用strip_cdata=False初始化lxml吗?

Update Yes, it's an lxml thing. http://lxml.de/api.html#cdata So, the question becomes, is it possible to tell BeautifulSoup to initialize lxml with strip_cdata=False?

BeautifulSoup可以保留CDATA节吗

问题描述

推荐答案