问题描述
我正在尝试解析包含符合 XML 1.1 规范的 XML 内容的字符串.XML 包含在 XML 1.0 规范中不允许但在 XML 1.1 规范中允许的字符引用(字符引用转换为 U+0001–U+001F 范围内的 Unicode 字符).
根据 Xerces2 网站,Xerces2 解析器支持解析 XML 1.1 文档.但是,我不知道如何告诉它我们尝试解析的 XML 包含符合 1.1 的 XML.
According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.
我正在使用 DocumentBuilder 来解析 XML(类似这样):
I'm using a DocumentBuilder to parse the XML (something like this):
public Element parseString(String xmlString) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = dbf.newDocumentBuilder();
InputSource source = new InputSource(new StringReader(xmlString));
// Throws org.xml.sax.SAXParseException becuase of the invalid character refs
Document doc = documentBuilder.parse(source);
return doc.getDocumentElement();
} catch (ParserConfigurationException pce) {
// Handle the error
} catch (SAXException se) {
// Handle the error
} catch (IOException ioe) {
// Handle the error
}
}
我已尝试设置 XML 标头以指示 XML 符合 1.1 规范...
I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...
xmlString = "<?xml version="1.1" encoding="UTF-8" ?>" + xmlString;
...但仍被解析为 1.0 XML(仍会生成无效字符引用异常).
...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).
如何配置 Xerces 解析器以将 XML 解析为 XML 1.1?是否有其他解析器可以为 XML 1.1 提供更好的支持?
How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?
推荐答案
看这里 查看 xerces 支持的所有功能的列表.可能低于 2 个功能是您必须打开的.
See here for a list of all the features supported by xerces. May be below 2 features is what you have to turn on.
http://xml.org/sax/features/unicode-normalization-checking
True:执行 Unicode 规范化检查(如 XML 1.1 建议的第 2.13 节和附录 B 中所述)并报告规范化错误.
True: Perform Unicode normalization checking (as described in section 2.13 and Appendix B of the XML 1.1 Recommendation) and report normalization errors.
False:不报告 Unicode 规范化错误.
False: Do not report Unicode normalization errors.
http://xml.org/sax/features/xml-1.1
正确:解析器同时支持 XML 1.0 和 XML 1.1.
False:解析器仅支持 XML 1.0.
访问:只读自:Xerces-J 2.7.0注意:此功能的价值取决于 SAX 解析器拥有的解析器配置是否已知支持 XML 1.1.
True: The parser supports both XML 1.0 and XML 1.1.
False: The parser supports only XML 1.0.
Access: read-onlySince: Xerces-J 2.7.0Note: The value of this feature will depend on whether the parser configuration owned by the SAX parser is known to support XML 1.1.
这篇关于如何使用 Java 和 Xerces 解析符合 1.1 规范的 XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!