问题描述
我使用的是 Apache Xerces 2.11.0 和 Apache Xalan 2.7.1,但我在序列化 XML 中遇到附加回车字符的问题.
I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.
我有这个(伪)代码:
String myString = ...;
Document doc = ...;
Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));
Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);
现在 myString
包含换行符 (\r\n
),(实际上是 base64 编码的数据)但是当我查看序列化输出时,还有额外的 \r
个字符.
Now myString
contains line breaks (\r\n
), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r
characters.
输入:
Line 1 \r\n
Line 2 \r\n
Line 3 \r\n
输出:
Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n
如果我使用 createTextNode
而不是 createCDATASection
输出变得更加有趣:
If I use createTextNode
instead of createCDATASection
the output becomes even more interesting:
Line 1 \r\n
Line 2 \r\n
Line 3 \r\n
附加字符好像是在序列化的时候引入的,DOM树好像是对的.(根据getTextContent()
)
The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent()
)
为什么会这样?我该怎么做才能解决这个问题?
Why is this happening? What can I do to fix this?
推荐答案
我猜你是在 Windows 上遇到这个问题,而不是在 Linux/Solaris/Mac 上.Xalan 序列化程序 (org.apache.xml.serializer.ToStream.java) 使用 System.getProperty("line.separator") 获取行分隔符.当序列化程序写入 \r\n 时,它会将 \n 解释为行序列的结尾,并且实际上写入 \r+lineSeparator = \r\r\n.尽管这听起来很奇怪,但这不是错误,请参阅 [1].但由于这经常被报告为错误,因此添加了一个 xalan 扩展属性 [2].所以你可以通过编程方式设置:
I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:
transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");
或
<xsl:output xalan:line-separator=" " />
其中 xalan 是与 URL "http://xml.apache.org/xalan".
where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".
[1] https://issues.apache.org/jira/browse/XALANJ-1660
[2] https://issues.apache.org/jira/browse/XALANJ-2093
这篇关于为什么 Apache Xerces/Xalan 向我的序列化输出添加额外的回车?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!