我有一个表,该表是来自SQL Server 2012数据库(annonsid, annonsid2
)中对象的链接表。该表用于创建三角形甚至矩形的链,以查看谁可以与谁交换。
这是我在表Matching_IDs
上使用的查询,该表中包含1,500万行,使用此查询产生了1400万个可能的链:
SELECT COUNT(*)
FROM Matching_IDs AS m
INNER JOIN Matching_IDs AS m2
ON m.annonsid2 = m2.annonsid
INNER JOIN Matching_IDs AS m3
ON m2.annonsid2 = m3.annonsid
AND m.annonsid = m3.annonsid2
我必须提高性能以花费大约1秒钟或更短的时间,是否有更快的方法来做到这一点?在我的计算机上查询大约需要1分钟。我通常使用
WHERE m.annonsid=x
,但是它花费的时间是相同的,因为无论如何它必须经历所有可能的组合。更新:最新的查询计划
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[globalagg1011],0)))
|--Stream Aggregate(DEFINE:([globalagg1011]=SUM([partialagg1010])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1010]=Count(*)))
|--Hash Match(Inner Join, HASH:([m2].[annonsid2], [m2].[annonsid])=([m3].[annonsid], [m].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m2].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m3].[annonsid] AND [MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m2].[annonsid]))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m2].[annonsid2], [m2].[annonsid]))
| |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m2]))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid], [m].[annonsid2]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([m].[annonsid])=([m3].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m].[annonsid]=[MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m3].[annonsid2]))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[annonsid]), ORDER BY:([m].[annonsid] ASC))
| |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133152] AS [m]), ORDERED FORWARD)
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid2]), ORDER BY:([m3].[annonsid2] ASC))
|--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m3]), ORDERED FORWARD)
最佳答案
一些想法:
尝试两个索引(annonsid,annonsid2)和(annonsid2,annonsid)
您是否尝试过列存储索引?它使表只读,但可能会提高性能。
此外,查询的某些变体可能会有所帮助。例子:
SELECT COUNT(*)
FROM Matching_IDs AS m
INNER JOIN Matching_IDs AS m2
ON m.annonsid2 = m2.annonsid
INNER JOIN Matching_IDs AS m3
ON m2.annonsid2 = m3.annonsid
where m.annonsid = m3.annonsid2
或者
SELECT COUNT(*)
FROM Matching_IDs AS m, Matching_IDs AS m2, Matching_IDs AS m3
where m2.annonsid2 = m3.annonsid
and m.annonsid2 = m2.annonsid
and m.annonsid = m3.annonsid2
您是否检查了CPU/IO-Load?如果IO-Load高,则服务器不是在处理数字,而是交换=>更多RAM可解决此问题。
这个查询有多快?
SELECT COUNT(*)
FROM Matching_IDs AS m
INNER JOIN Matching_IDs AS m2
ON m.annonsid2 = m2.annonsid
如果这非常快,但是添加下一个连接会使速度变慢,那么您可能需要更多的RAM。
关于sql - 优化数百万行的自我连接,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14086520/