本文介绍了有没有一种方法可以在lxml中禁用锚属性的urlencoding的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用lxml 2.2.8,并尝试将一些现有的html文件转换为Django模板.我唯一的问题是lxml urlencodes锚点名称和href属性.例如:
I am using lxml 2.2.8 and trying to transform some existing html files into django templates.the only problem that i am having is that lxml urlencodes the anchor name and href attributes.for example:
<xsl:template match="a">
<!-- anchor attribute href is urlencoded but the title is escaped -->
<a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}">
<!-- name tag is urlencoded -->
<xsl:attribute name="name">{{item.name}}</xsl:attribute>
<!-- but other attributes are not -->
<xsl:attribute name="nid">{{item.nid}}</xsl:attribute>
<xsl:attribute name="class">{{item.class_one}}</xsl:attribute>
<xsl:apply-templates/>
</a>
像这样产生html:
<a href="%7B%7Bitem.get_absolute_url%7D%7D"
title="{{item.title}}" name="%7B%7Bitem.name%7D%7D"
nid="{{item.nid}}" class="{{item.class_one}}">more info</a>
我要的是这个东西
<a href="{{item.get_absolute_url}}">more info</a>
有没有办法禁用lxml正在执行的(自动)urlencoding?
is there a way to disable the (automatic) urlencoding that lxml is doing?
(基本上)这是我用来生成和解析文件的代码:
here is (basically) the code I am using to generate and parse the file:
from lxml import etree, html
from StringIO import StringIO
doc = StringIO(
'''<html>
<head>
<title>An experiment</title>
</head>
<body>
<p class="one">This is an interesting paragraph detailing the inner workings of something</p>
<p class="two">paragraph with <a href="/link/to/more">more info</a></p>
<p>posted by: me</p>
</body>
</html>''')
stylesheet = StringIO(
'''<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="xhtml xsl">
<xsl:template match="p[@class='one']">
<xsl:copy>
<!-- when adding an attribute with the xsl:attribute tag -->
<!-- the curly braces are not escaped, ie you dont have -->
<!-- to double them up -->
<xsl:attribute name="class">{{item.class_one}}</xsl:attribute>
<xsl:attribute name="nid">{{item.nid}}</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[@class='two']">
<!-- but double 'em up in this instance -->
<p class="{{{{item.class_two}}}}">
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="a">
<!-- anchor attribute href is urlencoded but the title is escaped -->
<a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}">
<!-- name tag is urlencoded -->
<xsl:attribute name="name">{{item.name}}</xsl:attribute>
<!-- but oher attributes are not -->
<xsl:attribute name="nid">{{item.nid}}</xsl:attribute>
<xsl:attribute name="class">{{item.class_one}}</xsl:attribute>
<xsl:apply-templates/>
</a>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
''')
def parse_doc():
xsl = etree.parse(stylesheet)
trans = etree.XSLT(xsl)
root = html.parse(doc, etree.HTMLParser(encoding="windows-1252"))
transformed = trans(root)
print html.tostring(transformed)
if __name__ == '__main__':
parse_doc()
这些文件都是格式错误的html:)
with the exception that these files are all malformed html :)
推荐答案
也许您可以使用XML而不是HTML序列化器.
Maybe you can use the XML instead of the HTML serializer.
>>> from lxml import etree, html
>>>
>>> t = etree.XML('<a href="{{x}}" />')
>>>
>>> etree.tostring(t)
'<a href="{{x}}"/>'
>>> html.tostring(t)
'<a href="%7B%7Bx%7D%7D"></a>'
这篇关于有没有一种方法可以在lxml中禁用锚属性的urlencoding的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!