问题描述
我有一个带有<script>
的HTML文件:
I've got an HTML file with a <script>
in it:
<html>
<script type="application/custom+xml">
<my><xml><goes><here/></goes></xml></my>
</script>
</html>
我使用HTML Agility Pack对其进行了解析,然后将其转换为XML.
I parse it with HTML Agility Pack and then convert it to XML.
HtmlDocument html;
html.OptionOutputAsXml = true;
html.Save(stream);
...
XDocument xml = XDocument.Load(stream);
然后,我想使用LINQ-to-XML来查看script
标记的内容,该标记应包含我的XML作为CDATA.但是HTML Agility Pack将其弄乱了,我最终得到了这个转义的XML:
I then want to use LINQ-to-XML to look at the contents of the script
tag which should contain my XML as CDATA. But HTML Agility Pack messes it up somehow and I end up with this escaped XML:
<html>
<script type="application/custom+xml">
//<![CDATA[
<my><xml><goes><here/></goes></xml></my>
//]]>//
</script>
</html>
有人知道我如何告诉HTML Agility Pack不要逃避script
标记的内容吗?
Does anyone know how I can tell HTML Agility Pack not to escape the contents of the script
tag?
推荐答案
这很容易,默认情况下,AgilityPack设置为将脚本标签内容视为CData,这是在HtmlNode类的静态构造函数中完成的,如下所示:
That's rather easy, by default the AgilityPack is set to treat script tags content as CData, this is done in the static constructor of the HtmlNode class like so:
ElementsFlags.Add("script", HtmlElementFlag.CData);
要更改此设置,无需修改AgilityPack,所需要做的只是在代码之前做一件事,或者在程序启动时只做一次
To change this one doesn't have to modify the AgilityPack, all that's needed is one thing before your code, or just once when your program starts
HtmlNode.ElementsFlags.Remove("script");
只需在代码之前添加它,就像它对我有用.
Just add that before your code, like that it works for me.
这篇关于HTML Agility Pack转换为XML< script>腐败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!