问题描述
按随机顺序获取 top n
行的最佳方法是什么?
我使用这样的查询:
What is the best way to get top n
rows by random order?
I use the query like:
Select top(10) field1,field2 .. fieldn
from Table1
order by checksum(newid())
上述查询中的问题是随着表大小的增加,它会变得越来越慢.它将始终执行完整的聚集索引扫描以按随机顺序查找 top(10)
行.
The problem in the above query is that it will keep getting slower as table size increases. it will always do full clustered index scan to find top(10)
rows by random order.
还有其他更好的方法吗?
Is there any other better way to do it?
推荐答案
我已经对此进行了测试,并在更改查询时获得了更好的性能.
I have tested this and got better performance changing the query.
我在测试中使用的表的 DDL.
The DDL for the table I used in my tests.
CREATE TABLE [dbo].[TestTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col1] [nvarchar](100) NOT NULL,
[Col2] [nvarchar](38) NOT NULL,
[Col3] [datetime] NULL,
[Col4] [nvarchar](50) NULL,
[Col5] [int] NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
)
GO
CREATE NONCLUSTERED INDEX [IX_TestTable_Col5] ON [dbo].[TestTable]
(
[Col5] ASC
)
该表有 722888 行.
The table has 722888 rows.
第一次查询:
select top 10
T.ID,
T.Col1,
T.Col2,
T.Col3,
T.Col5,
T.Col5
from TestTable as T
order by newid()
第一次查询的统计:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 13 ms.
(10 row(s) affected)
Table 'TestTable'. Scan count 1, logical reads 12492, physical reads 14, read-ahead reads 6437, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 859 ms, elapsed time = 1700 ms.
执行计划第一次查询:
第二次查询:
select
T.ID,
T.Col1,
T.Col2,
T.Col3,
T.Col5,
T.Col5
from TestTable as T
inner join (select top 10 ID
from TestTable
order by newid()) as C
on T.ID = C.ID
第二次查询的统计:
SQL Server parse and compile time:
CPU time = 125 ms, elapsed time = 183 ms.
(10 row(s) affected)
Table 'TestTable'. Scan count 1, logical reads 1291, physical reads 10, read-ahead reads 399, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 516 ms, elapsed time = 706 ms.
执行计划第二次查询:
总结:
第二个查询使用 Col5
上的索引按 newid()
对行进行排序,然后执行聚集索引查找 10 次以获取输出.
The second query is using the index on Col5
to order the rows by newid()
and then it does a Clustered Index Seek 10 times to get the values for the output.
性能提升是因为 Col5
上的索引比聚集键窄,从而导致读取次数减少.
The performance gain is there because the index on Col5
is narrower than the clustered key and that causes fewer reads.
感谢 Martin Smith 指出这一点.
这篇关于性能随机排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!