本文介绍了HTML Agility Pack转换为XML< script>腐败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有<script>的HTML文件:

I've got an HTML file with a <script> in it:

<html>
   <script type="application/custom+xml">
   <my><xml><goes><here/></goes></xml></my>
   </script>
</html>

我使用HTML Agility Pack对其进行了解析,然后将其转换为XML.

I parse it with HTML Agility Pack and then convert it to XML.

HtmlDocument html;
html.OptionOutputAsXml = true;
html.Save(stream);
...
XDocument xml = XDocument.Load(stream);

然后,我想使用LINQ-to-XML来查看script标记的内容,该标记应包含我的XML作为CDATA.但是HTML Agility Pack将其弄乱了,我最终得到了这个转义的XML:

I then want to use LINQ-to-XML to look at the contents of the script tag which should contain my XML as CDATA. But HTML Agility Pack messes it up somehow and I end up with this escaped XML:

<html>
<script type="application/custom+xml">
//<![CDATA[
&lt;my&gt;&lt;xml&gt;&lt;goes&gt;&lt;here/&gt;&lt;/goes&gt;&lt;/xml&gt;&lt;/my&gt;
//]]>//
</script>
</html>

有人知道我如何告诉HTML Agility Pack不要逃避script标记的内容吗?

Does anyone know how I can tell HTML Agility Pack not to escape the contents of the script tag?

推荐答案

这很容易,默认情况下,AgilityPack设置为将脚本标签内容视为CData,这是在HtmlNode类的静态构造函数中完成的,如下所示:

That's rather easy, by default the AgilityPack is set to treat script tags content as CData, this is done in the static constructor of the HtmlNode class like so:

ElementsFlags.Add("script", HtmlElementFlag.CData);

要更改此设置,无需修改AgilityPack,所需要做的只是在代码之前做一件事,或者在程序启动时只做一次

To change this one doesn't have to modify the AgilityPack, all that's needed is one thing before your code, or just once when your program starts

HtmlNode.ElementsFlags.Remove("script");

只需在代码之前添加它,就像它对我有用.

Just add that before your code, like that it works for me.

这篇关于HTML Agility Pack转换为XML&lt; script&gt;腐败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 15:22