问题描述
我正在使用XSLT从XML文件中提取一些带有特殊字符(例如& nbsp;
)的HTML内容.内容存储在< content>
节点中.我已经定义了以下大多数特殊字符:<!ENTITY nbsp ">
,因此此表达式可以正常工作:
< xsl:copy-of select ="content" disable-output-escaping ="yes"/>
现在,我想在该内容中找到的每个链接中添加 target ="_ blank"
.这是我想出的解决方案:
< xsl:template match ="a" mode ="html">< a>< xsl:attribute name ="href">< xsl:value-of select ="@ *"/></xsl:attribute>< xsl:attribute name ="target"> _blank</xsl:attribute>< xsl:apply-templates select ="text()| *"/></a></xsl:template>
我使用的不是"copy-of"元素:
< xsl:apply-templates select ="content" mode ="html"/>
现在所有这些特殊字符(以及nbsp)也从输出中消失了.我该如何保留它们?似乎 disable-output-escaping ="yes"
在这里无济于事.
好的,我在PHP中使用XSLTProcessor类. disable-output-escapeing
属性实际上并没有给出错误,但是当我删除它时,输出与所有nbsp相同,所以没关系.
UPD.使用我之前显示的XSL模板,我的输入示例:
<?xml version ="1.0" encoding ="UTF-8"?><!DOCTYPE页面SYSTEM"html-entities.xsl">< content>有一个不间断的< a href ="http://localhost"> space</a></content>内部.
html-entities.xsl:
<?xml version ="1.0" encoding ="UTF-8"?><!ENTITY nbsp ">
PHP代码:
$ xp = new XSLTProcessor();$ xsl =新的DOMDocument();$ xsl-> load($ xsl_filename);$ xp-> importStylesheet($ xsl);$ xml_doc =新的DOMDocument();$ xml_doc-> resolveExternals = true;$ xml_doc-> load($ xml_filename);$ html = $ xp-> transformToXML($ xml_doc);
我当前的输出:
有一个破折号< a href ="http://localhost" target ="_ blank"> space</a>内部.
我想要的输出:
有一个不间断的< a href ="http://localhost" target ="_ blank"> space</a>内部.
基本上,输入XML文档的源代码是否具有字符引用(如 
)或实体引用(如& nbsp;
或此类字符从字面上看对XSLT无关紧要,并且不会影响输入的处理方式和输出的外观;基本上,XSLT在带有存储在文本节点中的Unicode字符的树上运行.至少从理论上来说,您的PHP代码似乎可以与DOM树模型一起使用,该树模型可以存储实体引用节点,但即使如此,对于XSLT也不重要.在输入树中,应该有包含Unicode字符的文本节点(如果可能是Unicode 160的不间断空格字符,则为一个),如果将此类文本复制到输出中,则结果树中的文本节点将具有相同的Unicode字符
对于输出方法 html
,某些XSLT处理器(例如Saxon 6.5.5)可能会帮助您确保将HTML中定义为实体的字符与相应的实体引用进行序列化,即使它们不要这样做,结果树的序列化应该是带有正确Unicode字符的文件,并按照 xsl:output
元素的 encoding
属性的指示进行编码./p>
您当前的结果完全删除了字符(例如存在破折号
),对我来说没有意义.
I'm using XSLT to extract some HTML content with special characters (like
) from an XML file. The content is stored in <content>
nodes. I have defined most special characters like this: <!ENTITY nbsp " ">
, so this expression works perfectly fine:
<xsl:copy-of select="content" disable-output-escaping="yes"/>
Now, I want to add target="_blank"
to every link found within that content. This is the solution I came up with:
<xsl:template match="a" mode="html">
<a>
<xsl:attribute name="href"><xsl:value-of select="@*"/></xsl:attribute>
<xsl:attribute name="target">_blank</xsl:attribute>
<xsl:apply-templates select="text()|* "/>
</a>
</xsl:template>
And instead of the "copy-of" element I use this:
<xsl:apply-templates select="content" mode="html"/>
Now all those special characters (and nbsp too) disappeared from the output. How do I keep them? Seems like disable-output-escaping="yes"
doesn't help here.
Ok, I'm using the XSLTProcessor class in PHP. The disable-output-escaping
attribute didn't give an error actually, but when I removed it, the output was the same, with all the nbsp's, so it didn't matter.
UPD. With the XSL template I have shown before, my input sample:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE page SYSTEM "html-entities.xsl">
<content>There is a non-breaking <a href="http://localhost">space</a> inside.</content>
html-entities.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY nbsp " ">
PHP code:
$xp = new XSLTProcessor();
$xsl = new DOMDocument();
$xsl->load($xsl_filename);
$xp->importStylesheet($xsl);
$xml_doc = new DOMDocument();
$xml_doc->resolveExternals = true;
$xml_doc->load($xml_filename);
$html = $xp->transformToXML($xml_doc);
My current output:
There is anon-breaking <a href="http://localhost" target="_blank">space</a> inside.
My desired output:
There is a non-breaking <a href="http://localhost" target="_blank">space</a> inside.
Basically whether the source code of the input XML document has a character reference like  
or an entity reference like
or such a character literally does not matter to XSLT and does not make a difference how the input is processed and how the output looks; basically XSLT operates on a tree with Unicode characters stored in text nodes. At least that is the theory, your PHP code seems to work with a DOM tree model which might store entity reference nodes but even then for XSLT that shouldn't matter. In the input tree there should be text nodes containing Unicode characters (one if which could be the non-breaking space character with Unicode 160) and if you copy such a text to the output the result tree has a text node with the same Unicode characters.
For the output method html
some XSLT processors (Saxon 6.5.5 for instance) might do you the favour to ensure characters defined as entities in HTML are serialized with the corresponding entity reference but even if they don't do that the serialization of the result tree should be a file with the proper Unicode characters, encoded as directed by the encoding
attribute of the xsl:output
element.
Your current result which completely drops the character (e.g. There is anon-breaking
) does not make sense to me.
这篇关于保持XSLT输出中的其他特殊字符以及带有apply-templates的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!