问题描述
这不是关于如何解决""XML解析:...非法的xml字符" 错误的问题,但是关于为什么正在发生?我知道有一些修复方法( 1 ,, 3 ),但需要在选择最佳解决方案之前知道问题出在哪里(到底是什么导致了错误?).
This is not a question on how to overcome the "XML parsing: ... illegal xml character" error, but about why it is happening? I know that there are fixes(1, 2, 3), but need to know where the problem arises from before choosing the best solution (what causes the error under the hood?).
我们正在使用C#调用基于Java的Web服务.根据返回的强类型数据,我们正在创建一个XML文件,该文件将传递给SQL Server.Web服务数据是使用UTF-8编码的,因此在C#中我们创建文件,并在适当的地方指定UTF-8:
We are calling a Java-based webservice using C#. From the strongly-typed data returned, we are creating an XML file that will be passed to SQL Server. The webservice data is encoding using UTF-8, so in C# we create the file, and specify UTF-8 where appropriate:
var encodingType = Encoding.UTF8;
// logic removed...
var xdoc = new XDocument();
xdoc.Declaration = new XDeclaration("1.0", encodingType.WebName, "yes");
// logic removed...
System.IO.File.WriteAllText(xmlFullPath, xdoc.Declaration.ToString() + xdoc.Document.ToString(), encodingType);
这会在磁盘上创建一个XML文件,其中包含以下(缩写)数据:
This creates an XML file on disk that has contains the following (abbreviated) data:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<records>
<r RecordName="Option - Foo" />
<r RecordName="Option – Bar" />
</records>
请注意,在第二条记录中,-
与 –
不同.我认为第二个实例是破折号.
Notice that in the second record, -
is different to –
. I believe the second instance is en-dash.
如果我在Firefox/IE/VS2015中打开该XML文件.它打开没有错误. W3C XML验证器也可以正常工作.但是,SSMS 2012不喜欢它:
If I open that XML file in Firefox/IE/VS2015. it opens without error. The W3C XML validator also works fine. But, SSMS 2012 does not like it:
declare @xml XML = '<?xml version="1.0" encoding="utf-8" standalone="yes"?><records>
<r RecordName="Option - Foo" />
<r RecordName="Option – Bar" />
</records>';
那么为什么破折号会导致错误?根据我的研究,看来
So why does en-dash cause the error? From my research, it would appear that
...其中的一个破折号不是一个.编码版本(用–
替换 –
)可以正常工作.
...of which en-dash is not one. An encoded version (replacing –
with –
) works fine.
根据输入,人们说破折号未被识别为UTF-8,但仍在此处列出 http://www.fileformat.info/info/unicode/char/2013/index.htm 因此,作为一个完全合法的字符,为什么SSMS不能以XML形式(使用UTF-8或UTF-16)将其读取?
Based on the input, people state that en-dash isn't recognised as UTF-8, but yet it is listed here http://www.fileformat.info/info/unicode/char/2013/index.htmSo, as a perfectly legal character, why won't SSMS read it when passed as XML (using UTF-8 OR UTF-16)?
推荐答案
可以修改XML编码声明吗?如果是这样
Can you modify the XML encoding declaration? If so;
declare @xml XML = N'<?xml version="1.0" encoding="utf-16" standalone="yes"?><records>
<r RecordName="Option - Foo" />
<r RecordName="Option – Bar" />
</records>';
select @xml
(No column name)
<records><r RecordName="Option - Foo" /><r RecordName="Option – Bar" /></records>
投机性编辑
这两个方法均失败,并显示 非法xml字符 :
Speculative Edit
Both of these fail with illegal xml character:
set @xml = '<?xml version="1.0" encoding="utf-8"?><x> – </x>'
set @xml = '<?xml version="1.0" encoding="utf-16"?><x> – </x>'
因为它们将非unicode varchar
传递给XML解析器;字符串包含Unicode,因此必须这样处理,即作为 nvarchar
(utf-16)(否则,构成 –
的3个字节会被误解为多个字符,并且一个或超出XML的可接受范围)
because they pass a non-unicode varchar
to the XML parser; the string contains Unicode so must be treated as such, i.e. as an nvarchar
(utf-16) (otherwise the 3 bytes comprising the –
are misinterpreted as multiple characters and one or more is not in the acceptable range for XML)
这确实将 nvarchar
字符串传递给解析器,但由于 无法切换编码 而失败:
This does pass a nvarchar
string to the parser,but fails with unable to switch the encoding:
set @xml = N'<?xml version="1.0" encoding="utf-8"?><x> – </x>'
这是因为将 nvarchar
(utf-16)字符串传递到XML解析器,但是XML文档声明了其utf-8,并且 –
在两种编码
This is because an nvarchar
(utf-16) string is passed to the XML parser but the XML document states its utf-8 and the –
is not equivalent in the two encodings
这一切都是因为utf-16
This works as everything is utf-16
set @xml = N'<?xml version="1.0" encoding="utf-16"?><x> – </x>'
这篇关于为什么破折号(–)会触发非法的XML字符错误(C#/SSMS)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!