问题描述
由于SQL数据正在快速增长,因此我刚刚开始研究通过索引优化查询。我查看了优化程序如何通过SSMS中的执行计划处理查询,并注意到正在使用Sort运算符。我听说排序运算符表示查询中的设计不正确,因为可以通过索引提前进行排序。因此,这里有一个示例表和类似于我正在做的数据:
I’ve just started looking into optimizing my queries through indexes because SQL data is growing large and fast. I looked at how the optimizer is processing my query through the Execution plan in SSMS and noticed that a Sort operator is being used. I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index. So here is an example table and data similar to what I’m doing:
IF OBJECT_ID('dbo.Store') IS NOT NULL DROP TABLE dbo.[Store]
GO
CREATE TABLE dbo.[Store]
(
[StoreId] int NOT NULL IDENTITY (1, 1),
[ParentStoreId] int NULL,
[Type] int NULL,
[Phone] char(10) NULL,
PRIMARY KEY ([StoreId])
)
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '2223334444')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '3334445555')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '0001112222')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '1112223333')
GO
以下是示例查询:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND ([Type] = 0 OR [Type] = 1)
ORDER BY [Phone]
我创建了一个非聚集索引来帮助加快查询速度:
I create a non clustered index to help speed up the query:
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
要构建IX_Store索引,我从简单谓词开始
To build the IX_Store index, I start with the simple predicates
[ParentStoreId] = 10
AND ([Type] = 0 OR [Type] = 1)
然后我添加 [Phone]
列用于ORDER BY,并覆盖SELECT输出
Then I add the [Phone]
column for the ORDER BY and to cover the SELECT output
因此,即使建立了索引,优化器仍使用Sort运算符(而不是索引排序),因为 [Phone]
是在 [ParentStoreId]
之后排序的 [类型]
。如果我从索引中删除 [Type]
列并运行查询:
So even when the index is built, the optimizer still uses the Sort operator (and not the index sort) because [Phone]
is sorted AFTER [ParentStoreId]
AND [Type]
. If I remove the [Type]
column from the index and run the query:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
--AND ([Type] = 0 OR [Type] = 1)
ORDER BY [Phone]
然后,优化器当然不使用Sort运算符,因为 [电话]
按 [ParentStoreId]
排序。
Then of course the Sort operator is not used by the optimizer because [Phone]
is sorted by [ParentStoreId]
.
所以问题是我如何创建一个覆盖查询的索引(包括 [Type]
谓词)并且不让优化程序使用Sort?
So the question is how can I create an index that will cover the query (including the [Type]
predicate) and not have the optimizer use a Sort?
编辑:
我正在使用的表有超过2000万行
The table I'm working with has more than 20 million rows
推荐答案
首先,您应该验证排序实际上是性能瓶颈。排序的持续时间将取决于要排序的元素的数量,并且特定父存储的存储数量可能很小。 (这是假设在应用了where子句之后应用了排序运算符。)
First, you should verify that the sort is actually a performance bottleneck. The duration of the sort will depend on the number of elements to be sorted, and the number of stores for a particular parent store is likely to be small. (That is assuming the sort operator is applied after applying the where clause).
通常,可以将排序运算符平移到索引中,并且,如果仅获取结果集的前几行,则可以大大降低查询成本,因为数据库不再需要获取所有匹配的行(并对它们进行排序)全部)以查找第一个记录,但可以按结果集顺序读取记录,并在找到足够的记录后停止。
That's an over-generalization. Often, a sort-operator can trivially be moved into the index, and, if only the first couple rows of the result set are fetched, can substantially reduce query cost, because the database no longer has to fetch all matching rows (and sort them all) to find the first ones, but can read the records in result set order, and stop once enough records are found.
在您的情况下,您似乎正在获取整个结果集,因此排序不太可能使情况变得更糟(除非结果集很大)。另外,在您的情况下,构建有用的排序索引可能不是一件容易的事,因为where子句包含or。
In your case, you seem to be fetching the entire result set, so sorting that is unlikely to make things much worse (unless the result set is huge). Also, in your case it might not be trivial to build a useful sorted index, because the where clause contains an or.
现在,如果您仍想摆脱该排序运算符,您可以尝试:
Now, if you still want to get rid of that sort-operator, you can try:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] in (0, 1)
ORDER BY [Phone]
或者,您可以尝试以下索引:
Alternatively, you can try the following index:
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Phone], [Type])
尝试获取查询优化器仅对 ParentStoreId
进行索引范围扫描,然后扫描索引中所有匹配的行,如果 Type
匹配项。但是,这可能会导致更多的磁盘I / O,从而降低查询速度而不是加快查询速度。
to try getting the query optimizer to do an index range scan on ParentStoreId
only, then scan all matching rows in the index, outputting them if Type
matches. However, this is likely to cause more disk I/O, and hence slow your query down rather than speed it up.
编辑:不得已时,您可以使用
Edit: As a last resort, you could use
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 0
ORDER BY [Phone]
UNION ALL
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 1
ORDER BY [Phone]
with
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
并对两个列表进行排序在应用程序服务器上,您可以在其中合并(如合并排序)预排序列表,从而避免进行完整排序。但这确实是一个微优化,虽然可以将排序本身加快一个数量级,但不太可能对查询的总执行时间产生很大影响,因为我希望瓶颈是网络和磁盘I / O,
and sort the two lists on the application server, where you can merge (as in merge sort) the presorted lists, thereby avoiding a complete sort. But that's really a micro-optimization that, while speeding up the sort itself by an order of magnitude, is unlikely to affect the total execution time of the query much, as I'd expect the bottleneck to be network and disk I/O, especially in light of the fact that the disk will do a lot of random access as the index is not clustered.
这篇关于通过在执行计划中删除排序运算符来优化SQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!