本文介绍了SQL Server XML 列性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将包含 XML 的 nText 列转换为 XML 数据类型会导致 SQL Server 的性能下降.

Converting nText columns which contained XML to the XML data type has resulted in worse performance in SQL Server.

我目前正在从事一个项目,其中 nText 列已用于存储有效的 XML.我已成功将这些列迁移到 XML 数据类型.然而,根据 SQL Profiler,XML 数据类型的性能比使用 nText 或 nvarchar(max) 来存储 XML 差.我所读到的一切都暗示这不应该是这种情况.

I am currently working on a project where nText columns have been used to store valid XML. I have successfully migrated these columns to the XML data type. However according to SQL Profiler the performance of the XML data type is worse than using nText or nvarchar(max) to store the XML. Everything I have read implies that this should not be the case.

为了验证这一点,我创建了两个具有相同索引等的表

In order to verify this I created two tables with the same indexes etc

Table Name Order1
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [ntext] NOT NULL,
[CustomProperties] [ntext] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...

Table Name Order2
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [xml] NOT NULL,
[CustomProperties] [xml] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...

然后,我使用 select/insert 语句复制了数据,并在两个表上重建了索引.然后我使用以下 SQL 创建了一个脚本.

I have then copied the data using a select/insert statement and rebuilt the indexes on both the tables. I then created a script with the following SQL.

DBCC DROPCLEANBUFFERS
GO
--Part1
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'

DBCC DROPCLEANBUFFERS
GO
--Part2
Select id, uid, AffiliateId, Address, OrderNumber,
CAST(CustomProperties AS xml).query('CustomProperty/Key[text()="AgreedToTerms"]/../Value/text()')  as "TermsAgreed"
from Order1

DBCC DROPCLEANBUFFERS
GO
--Part3
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where  uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'

DBCC DROPCLEANBUFFERS
GO
-- Part4 This updates a .5M row table.
Update [dbo].[Order1] Set CustomProperties = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX)), Address = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX))

SQL Profiler 的结果平均结果如下:-

The results average results from the SQL Profiler are as follows:-

NTEXT

+-------+-------------+-------------+-------------+-------------+
| Test  |     CPU     |    Reads    |   Writes    |  Duration   |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 281.3333333 | 129.3333333 |           0 |         933 |
| Part2 | 78421.66667 |     5374306 | 10.66666667 | 47493.66667 |
| Part3 | 281.6666667 |         616 | 27.66666667 | 374.6666667 |
| Part4 | 40312.33333 | 15311252.67 |      320662 |       67010 |
| Total |             |             |             | 115811.3333 |
+-------+-------------+-------------+-------------+-------------+


XML

+-------+-------------+-------------+-------------+-------------+
| Test  |     CPU     |    Reads    |   Writes    |  Duration   |
+-------+-------------+-------------+-------------+-------------+
| Part1 |         282 | 58.33333333 |           0 | 949.3333333 |
| Part2 | 21129.66667 | 180143.3333 |           0 | 76048.66667 |
| Part3 |         297 | 370.3333333 | 14.66666667 |         378 |
| Part4 | 112578.3333 | 8908940.667 | 145703.6667 | 114684.3333 |
| Total |             |             |             | 192060.3333 |
+-------+-------------+-------------+-------------+-------------+

测试脚本有缺陷吗?或者是否需要对 https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345115(v=sql.90)

Is the test script flawed? Or is there some other optimisation that needs to be carried out for xml data type columns out side of https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345115(v=sql.90)

我希望 XML 列类型的性能优于 ntext.

I would expect the XML column type to outperform ntext.

推荐答案

所以这可能不是答案,至少不是解决方案,但它有望帮助理解正在发生的事情...

So this might not be an answer, at least not a solution, but it will hopefully help to understand what's going on...

XML 最昂贵的部分是初始解析,换句话说:文本表示和技术存储之间的转换.

The most expensive part with XML is the initial parsing, put in other words: The transformation between the textual representation and the technical storage.

重要提示:原生 XML 不是存储为您看到的文本,而是存储为层次结构表.当您将一些文本 XML 传递到 SQL-Server 时,这需要非常繁重的处理.为人类读者调用此 XML 需要相反的过程.将此字符串存储在字符串列中(请注意,NTEXT 已被弃用几个世纪)比将其存储为原生 XML 更快,但您将失去许多优势.

Important to know: Native XML is not stored as the text you see, but as hierachy table. This needs very heavy proecessing when you pass some textual XML into SQL-Server. Calling this XML for a human reader needs the opposite process. Storing this string in a string column (be aware that NTEXT is deprecated for centuries) is faster, than storing it as native XML, but you will lose many advantages.

所以你的脚本:

我假设您运行了相同的脚本,但只是将 Order1 更改为 Order2.这是正确的吗?

I assume, that you ran the same script but just changed Order1 to Order2. Is this correct?

为了提供可读的表示,SQL-Server(或者更确切地说 SSMS)会将任何值转换为某种文本.如果您的表包含 INT、GUID 或 DateTime,您将看不到实际的位模式,对吗?SSMS 将使用相当昂贵的操作来为您创建可读的内容.昂贵的部分是转型.字符串不需要这个,所以 NTEXT 会更快.

In order to offer a readable representation, SQL-Server (or rather SSMS) will transform any value to some kind of text. If your tables include INTs, GUIDs or a DateTime, you would not see the actual bit patter, would you? SSMS will use quite expensive actions to create something readable for you. The expensive part is the transformation. Strings do not need this, so NTEXT will be faster.

您是否也将 CAST( AS XML)Order2 一起使用?然而,有这样的需求,XML 应该更快,因为 NTEXT 将不得不一遍又一遍地进行繁重的解析,而 XML 已经以可查询的格式存储......).试试这个:

Did you use the CAST( AS XML) with Order2 too? However, with such a need XML should be faster, because NTEXT will have to do the heavy parsing over and over, while XML is stored in a queryable format already... But your XQuery is rather sub-optimal (due to the backward navigation ../Value). Try this:

 .query('/CustomProperty[Key[text()="AgreedToTerms"]]/Value/text()')

这将查找 ,其中有一个 和给定的内容,并将读取 低于 不需要 ../

This will look for a <CustomProperty> where there is a <Key> with the given content and will read the <Value> below <CustomProperty> without the need of ../

我肯定希望在这里使用 CAST 的 XML 的性能优于 NTEXT...第一次调用全新的表和索引可能会返回有偏差的结果...

I'd surely expect XML to outperform NTEXT with a CAST here... The very first call to completely new tables and indexes might return biased results...

在这里,我希望得到相同的性能...如果您将一个字符串值移动到另一个字符串列中,这就是简单的复制.将原生 XML 移到另一个 XML 列中也很简单.

Here I would expect rather the same performance... If you move a string value into another string column this is simple copying. Moving native XML into another XML column is simple copying too.

这看起来很奇怪……你想达到什么目的?代码需要将您的原生 XML 转换为字符串并重新解析它们以存储在 XML 中.用 NTEXT 做同样的事情根本不需要这些昂贵的操作......

This looks rather weird... What are you trying to achieve? The code needs to tranform your native XMLs to strings and re-parse them to be stored in XML. Doing the same with NTEXT will not need these expensive actions at all...

  1. 如果您从外部获得一些 XML,从文件中读取它并且您只需要查询一次,字符串类型的字符串方法可以更快,但是:如果您想永久存储 XML 以便更频繁地使用和操作它们的值,那么原生 XML 类型会好得多.
  2. 在许多情况下,绩效衡量指标并不能衡量您认为自己做了什么...
  3. 尝试以某种方式创建您的测试,结果的呈现 不是测试的一部分(例如,对临时表执行 INSERT,停止时钟并推送临时表的输出)
  1. If you get some XML from outside, read it from a file and you need to query it just once, string methods on string types can be faster, but: If you want to store XML permanently in order to use and manipulate their values more often, the native XML type will be much better.
  2. In many cases performance measures do not measure what you think you do...
  3. Try to create your tests in a way, that the presentation of the results is not part of the test (e.g. do an INSERT against a temp table, stop the clock and push the output from the temp table)

更新第 2 部分"的另一个测试

试试这个测试脚本:

UPDATE Another test for "Part 2"

Try this test script:

USE master;
GO
CREATE DATABASE testShnugo;
GO
USE testShnugo;
GO
CREATE TABLE dbo.WithString(ID INT,SomeXML NTEXT);
CREATE TABLE dbo.WithXML(ID INT,SomeXML XML);
GO
--insert 100.000 rows to both tables
WITH Tally(Nmbr) AS (SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2)
INSERT INTO dbo.WithXML(ID,SomeXML)
SELECT Nmbr,(SELECT Nmbr AS [@nmbr],CONCAT('hallo',Nmbr) AS [SomeTest/@FindMe],CONCAT('SomeTestValue',Nmbr) As [SomeTest] FOR XML PATH('row'),ROOT('root'),TYPE)
FROM Tally
--copy everything to the second table
INSERT INTO dbo.WithString(ID,SomeXML) SELECT ID,CAST(SomeXML AS NVARCHAR(MAX)) FROM dbo.WithXML;
GO
--check the actual content
SELECT * FROM dbo.WithString;
SELECT * FROM dbo.WithXML;
GO

DECLARE @d DATETIME2=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE SomeXML LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE '
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE CAST(SomeXML AS xml).exist('/root/row[SomeTest[@FindMe="hallo333"]]')=1
PRINT 'CAST NTEXT to XML and .exist()'
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE CAST(SomeXML AS nvarchar(MAX)) LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE after CAST XML to NVARCHAR(MAX)'
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE SomeXML.exist('/root/row[SomeTest[@FindMe="hallo333"]]')=1
PRINT 'native XML with .exist()'
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

GO
USE master;
GO
DROP DATABASE testShnugo;

首先我创建表格并用 100.000 个这样的 XML 填充它们

First I create tables and fill them with 100.000 XMLs like this

<root>
  <row nmbr="1">
    <SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
  </row>
</root>

我的结果

String-Method LIKE
836

CAST NTEXT to XML and .exist()
1962

String-Method LIKE after CAST XML to NVARCHAR(MAX)
1079

native XML with .exist()
911

正如预期的那样,最快的方法是针对非常小的字符串上的字符串类型的字符串方法.但是 - 当然 - 这不会像精心设计的 XQuery 那样强大,并且无法处理 namspaces、多次出现等.

As expected the fastest approach is a string method against a string type on very tiny strings. But - of course - this will not be as mighty as an elaborated XQuery and will not be able to deal with namspaces, multiple occurances and so on.

最慢的是使用 .exist()

在转换为字符串之后针对原生 XML 的字符串方法实际上并没有那么糟糕,但这取决于 XML 的大小.这个非常小...

A string method against the native XML after a cast to string is not that bad actually, but this depends on the XML's size. This one was very tiny...

针对 100.000 个不同 XML 的 100.000 次非平凡 XQuery 调用几乎与纯字符串方法一样快.

And 100.000 non-trivial XQuery calls against 100.000 different XMLs is almost as fast as the pure string approach.

我只是通过在一行中更改上面的代码,用更大的 XML 重复了测试

I repeated the test with larger XMLs just by changing the code above in one line

    SELECT Nmbr,(SELECT TOP 100 Nmbr AS [@nmbr],CONCAT('hallo',x.Nmbr) AS [SomeTest/@FindMe],CONCAT('SomeTestValue',x.Nmbr) As [SomeTest] FROM Tally x FOR XML PATH('row'),ROOT('root'),TYPE)

现在每个 XML 都将由 100 个 元素组成.

Now each and any XML will consist of 100 <row> elements.

<root>
  <row nmbr="1">
    <SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
  </row>
  <row nmbr="2">
    <SomeTest FindMe="hallo2">SomeTestValue2</SomeTest>
  </row>
  <row nmbr="3">
    <SomeTest FindMe="hallo3">SomeTestValue3</SomeTest>
  </row>
  ...more of them

搜索 FindMe="hallo333" 这不会返回任何东西,但是找到没有什么可返回的时间对我们来说已经足够了:

With a search for FindMe="hallo333" this won't return anything, but the time to find, that there is nothing to return, is enough for us:

String-Method LIKE
71959

CAST NTEXT to XML and .exist()
74773

String-Method LIKE after CAST XML to NVARCHAR(MAX)
104380

native XML with .exist()
16374

最快的 - 到目前为止!- 现在是原生 XML.由于字符串的大小,字符串方法会丢失.

The fastest - by far! - is now the native XML. The string approaches get lost due to the strings sizes.

也请告诉我你的结果.

这篇关于SQL Server XML 列性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 21:36