我有一个表,该表是来自SQL Server 2012数据库(annonsid, annonsid2)中对象的链接表。该表用于创建三角形甚至矩形的链,以查看谁可以与谁交换。

这是我在表Matching_IDs上使用的查询,该表中包含1,500万行,使用此查询产生了1400万个可能的链:

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid
  INNER JOIN Matching_IDs AS m3
     ON m2.annonsid2 = m3.annonsid
       AND m.annonsid = m3.annonsid2

我必须提高性能以花费大约1秒钟或更短的时间,是否有更快的方法来做到这一点?在我的计算机上查询大约需要1分钟。我通常使用WHERE m.annonsid=x,但是它花费的时间是相同的,因为无论如何它必须经历所有可能的组合。

更新:最新的查询计划
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[globalagg1011],0)))
   |--Stream Aggregate(DEFINE:([globalagg1011]=SUM([partialagg1010])))
        |--Parallelism(Gather Streams)
             |--Stream Aggregate(DEFINE:([partialagg1010]=Count(*)))
                  |--Hash Match(Inner Join, HASH:([m2].[annonsid2], [m2].[annonsid])=([m3].[annonsid], [m].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m2].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m3].[annonsid] AND [MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m2].[annonsid]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m2].[annonsid2], [m2].[annonsid]))
                       |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m2]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid], [m].[annonsid2]))
                            |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([m].[annonsid])=([m3].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m].[annonsid]=[MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m3].[annonsid2]))
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[annonsid]), ORDER BY:([m].[annonsid] ASC))
                                 |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133152] AS [m]), ORDERED FORWARD)
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid2]), ORDER BY:([m3].[annonsid2] ASC))
                                      |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m3]), ORDERED FORWARD)

最佳答案

一些想法:

尝试两个索引(annonsid,annonsid2)和(annonsid2,annonsid)

您是否尝试过列存储索引?它使表只读,但可能会提高性能。

此外,查询的某些变体可能会有所帮助。例子:

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid
  INNER JOIN Matching_IDs AS m3
     ON m2.annonsid2 = m3.annonsid
where m.annonsid = m3.annonsid2

或者
SELECT COUNT(*)
FROM Matching_IDs AS m, Matching_IDs AS m2, Matching_IDs AS m3
where m2.annonsid2 = m3.annonsid
  and m.annonsid2 = m2.annonsid
  and m.annonsid = m3.annonsid2

您是否检查了CPU/IO-Load?如果IO-Load高,则服务器不是在处理数字,而是交换=>更多RAM可解决此问题。

这个查询有多快?
SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid

如果这非常快,但是添加下一个连接会使速度变慢,那么您可能需要更多的RAM。

关于sql - 优化数百万行的自我连接,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14086520/

10-15 21:04