在lxml中定义默认名称空间

在lxml中定义默认名称空间

本文介绍了在lxml中定义默认名称空间(无前缀)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用lxml渲染XHTML时,一切都很好,除非您碰巧使用了Firefox,而Firefox似乎无法处理以名称空间为前缀的XHTML元素和javascript.尽管Opera能够执行javascript(这对jQuery和MathJax都适用)很好,但是无论XHTML名称空间是否带有前缀(在我的情况下为h:),在Firefox中,脚本都会中止,并出现奇怪的错误( this.head未定义.

我知道register_namespace函数,但是它既不接受None也不接受""作为名称空间前缀.我听说过lxml.etree模块中的_namespace_map,但是我的Python抱怨此属性不存在(版本问题?)

是否还有其他方法可以删除XHTML命名空间的命名空间前缀?请注意,正如对另一个相关问题的回答中所建议的那样,str.replace不是我可以接受的方法,因为它不了解XML语义,并且很容易弄乱生成的文档. /p>

根据请求,您将找到两个可供使用的示例.具有命名空间前缀一个没有.第一个将在Firefox中显示0(错误),第二个将显示1(正确). Opera会正确渲染这两个.显然,这是一个Firefox错误,但这仅是想要无前缀的理由带有lxml的XHTML –还有其他充分的理由可以减少移动客户端等的流量(如果考虑数十个或数百个html标签,即使h:也是相当多的.)

解决方案

此XSL转换content中删除所有前缀,同时保留在根节点中定义的名称空间:

import lxml.etree as ET

content = '''\
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:ml="http://foo">
  <h:head>
    <h:title>MathJax Test Page</h:title>
    <h:script type="text/javascript"><![CDATA[
      function test() {
        alert(document.getElementsByTagName("p").length);
      };
    ]]></h:script>
  </h:head>
  <h:body onload="test();">
    <h:p>test</h:p>
    <ml:foo></ml:foo>
  </h:body>
</h:html>
'''
dom = ET.fromstring(content)

xslt = '''\
<xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>

<!-- identity transform for everything else -->
<xsl:template match="/|comment()|processing-instruction()|*|@*">
    <xsl:copy>
      <xsl:apply-templates />
    </xsl:copy>
</xsl:template>

<!-- remove NS from XHTML elements -->
<xsl:template match="*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@*|node()" />
    </xsl:element>
</xsl:template>

<!-- remove NS from XHTML attributes -->
<xsl:template match="@*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="." />
    </xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''

xslt_doc = ET.fromstring(xslt)
transform = ET.XSLT(xslt_doc)
dom = transform(dom)

print(ET.tostring(dom, pretty_print = True,
                  encoding = 'utf-8'))

收益

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>MathJax Test Page</title>
    <script type="text/javascript">
      function test() {
        alert(document.getElementsByTagName("p").length);
      };
    </script>
  </head>
  <body onload="test();">
    <p>test</p>
    <ml:foo xmlns:ml="http://foo"/>
  </body>
</html>

When rendering XHTML with lxml, everything is fine, unless you happen to use Firefox, which seems unable to deal with namespace-prefixed XHTML elements and javascript. While Opera is able to execute the javascript (this applies to both jQuery and MathJax) fine, no matter whether the XHTML namespace has a prefix (h: in my case) or not, in Firefox the scripts will abort with weird errors (this.head is undefined in the case of MathJax).

I know about the register_namespace function, but it does neither accept None nor "" as namespace prefix. I've heard about _namespace_map in the lxml.etree module, but my Python complains that this attribute doesn't exist (version issues?)

Is there any other way removing the namespace prefix for the XHTML namespace? Note that str.replace, as suggested in the answer to another, related question, is not a method I could accept, as it is not aware of XML semantics and might easily screw up the resulting document.

As per request, you'll find two examples ready to use. One with namespace prefixes and one without. The first one will display 0 in Firefox (wrong) and the second one will display 1 (correct). Opera will render both correct. This is obviously a Firefox bug, but this only serves as a rationale for wanting prefixless XHTML with lxml – there are other good reasons as to reduce traffic for mobile clients etc (even h: is quite a lot if you consider tens or hundret of html tags).

解决方案

This XSL transformation removes all prefixes from content, while maintaining namespaces defined in the root node:

import lxml.etree as ET

content = '''\
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:ml="http://foo">
  <h:head>
    <h:title>MathJax Test Page</h:title>
    <h:script type="text/javascript"><![CDATA[
      function test() {
        alert(document.getElementsByTagName("p").length);
      };
    ]]></h:script>
  </h:head>
  <h:body onload="test();">
    <h:p>test</h:p>
    <ml:foo></ml:foo>
  </h:body>
</h:html>
'''
dom = ET.fromstring(content)

xslt = '''\
<xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>

<!-- identity transform for everything else -->
<xsl:template match="/|comment()|processing-instruction()|*|@*">
    <xsl:copy>
      <xsl:apply-templates />
    </xsl:copy>
</xsl:template>

<!-- remove NS from XHTML elements -->
<xsl:template match="*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@*|node()" />
    </xsl:element>
</xsl:template>

<!-- remove NS from XHTML attributes -->
<xsl:template match="@*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="." />
    </xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''

xslt_doc = ET.fromstring(xslt)
transform = ET.XSLT(xslt_doc)
dom = transform(dom)

print(ET.tostring(dom, pretty_print = True,
                  encoding = 'utf-8'))

yields

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>MathJax Test Page</title>
    <script type="text/javascript">
      function test() {
        alert(document.getElementsByTagName("p").length);
      };
    </script>
  </head>
  <body onload="test();">
    <p>test</p>
    <ml:foo xmlns:ml="http://foo"/>
  </body>
</html>

这篇关于在lxml中定义默认名称空间(无前缀)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 21:03