我有一个包含四列的content集合; idtimestamplocationIDauthorID。这是我的数据示例;在生产中,这是几千万行的长度。

id    timestamp              locationID   authorID
1     2012-03-01 11:52:00    1            1
2     2012-03-16 19:56:00    1            2
3     2012-04-02 11:26:00    2            1
4     2012-04-22 11:52:00    2            3
5     2012-05-19 09:48:00    2            2
6     2012-05-30 07:12:00    2            1
7     2012-06-04 19:17:00    1            2


我想收集其最新authorIDs(按content排序)与特定timestamp匹配的locationID列表。

查询locationID = 2的正确值为:[ 1, 3 ],因为authorID 1和3是最近在locationID = 2上看到的,而authorID 2最近的内容是在locationID 1上。

我当然可以对每个authorID执行一个查询,但是在生产中authorID数组的长度> 100,000。这似乎效率极低(尤其是当每个“子查询”都将达到该数百万行content集合时),我正在寻找一种更好的方法来从我的数据集中出现此数据,理想情况下应足够快地在数据库上执行页面呈现。

最佳答案

像这样吗这来自SQL Server,但我认为它也应在mySQL中工作。

DECLARE @locationId INT
SET @locationId = 2;

SELECT *
FROM (SELECT AuthorId, Max(TimeStamp) as MaxTimeStamp
    FROM Content C
    WHERE LocationId = @locationId
    GROUP BY AuthorId) AS CBL
    LEFT JOIN Content AS C ON CBL.AuthorId = C.AuthorId
        AND C.TimeStamp > CBL.MaxTimeStamp
WHERE C.AuthorId IS NULL


对于locationId = 2,它返回1和3;对于locationId = 1,则返回2

每JW(谢谢!),正确的mySql方法:

SET @locationId := 2;

SELECT *
FROM (SELECT AuthorId, Max(TimeStamp) as MaxTimeStamp
    FROM Content C
    WHERE LocationId = @locationId
    GROUP BY AuthorId) AS CBL
    LEFT JOIN Content AS C ON CBL.AuthorId = C.AuthorId
        AND C.TimeStamp > CBL.MaxTimeStamp
WHERE C.AuthorId IS NULL

关于mysql - MySQL:找到子​​文档列表的最新值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14168261/

10-12 20:31