问题描述
在过去的 48 小时里,我一直在努力对抗这个绝对令人恼火的错误,所以我想我最终会认输并尝试在我把笔记本电脑扔出窗外之前在这里问一下.
I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.
我正在尝试解析来自我对 AWS SimpleDB 的调用的响应 XML.响应在网络上返回就好了;例如,它可能看起来像:
I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
我用
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
并多次调用 eventReader.nextEvent();
以获取我想要的数据.
and call eventReader.nextEvent();
a bunch of times to get the data I want.
这是奇怪的部分——它在本地服务器内运行良好.回复来了,我解析一下,大家开心就好.问题是,当我将代码部署到 Google App Engine 时,传出请求仍然有效,响应 XML 对我来说似乎 100% 相同且正确,但响应无法解析,出现以下异常:
Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
我对这个 XML 进行了两次、三次、四次检查,以查找不可见字符"或非 UTF8 编码字符等.我在数组中逐字节查看了字节顺序标记或类似性质的内容.没有;它通过了我可以投入的所有验证测试.更奇怪的是,如果我也使用基于 Saxon 的解析器,就会发生这种情况——但仅在 GAE 上,它在我的本地环境中始终可以正常工作.
I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.
当我只能在完美运行的环境中运行调试器时,很难跟踪问题的代码(我还没有找到任何在 GAE 上远程调试的好方法).尽管如此,使用我拥有的原始方法,我已经尝试了一百万种方法,包括:
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:
- 带有和不带有序言的 XML
- 有和没有换行符
- 在序言中有和没有encoding="属性
- 两种换行样式
- 有和没有 HTTP 流中存在的分块信息
而且我已经在多种组合中尝试了其中的大部分,在这些组合中它们会相互作用 - 没有!我已经无计可施了.有没有人以前见过这样的问题,希望能对此有所了解?
And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?
谢谢!
推荐答案
XML 和 XSD(或 DTD)中的编码不同.
XML 文件头:<?xml version='1.0' encoding='utf-8'?>
XSD 文件头:<?xml version='1.0' encoding='utf-16'?>
The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>
另一种可能的情况是在 XML 文档类型声明之前出现任何内容.即你可能在缓冲区中有这样的东西:
Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:
helloworld<?xml version="1.0" encoding="utf-8"?>
甚至是空格或特殊字符.
or even a space or special character.
缓冲区中可能存在一些称为字节顺序标记的特殊字符.在将缓冲区传递给解析器之前,请执行此操作...
There are some special characters called byte order markers that could be in the buffer.Before passing the buffer to the Parser do this...
String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\W]+)<","<");
这篇关于“序言中不允许有内容"在 GAE 上解析完全有效的 XML 时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!