php - 有效地获取大数据集的差异？

我需要能够区分两个查询的结果，显示“旧”集中的行，但不在“新”集中...然后显示“新”集中的行，但是不老了。

现在，我将结果拉入数组，然后执行array_diff（）。但是，我遇到了一些资源和时间问题，因为每组数据集接近100万行。

在两个结果集中，模式都是相同的（不包括setId号和表的自动增量号），因此我认为有一种直接在MySQL中直接执行此操作的好方法……但我没有找到方法。

Example Table Schema:
rowId,setId,userId,name

Example Data:
    1,1,user1,John
    2,1,user2,Sally
    3,1,user3,Tom
    4,2,user1,John
    5,2,user2,Thomas
    6,2,user4,Frank

我需要做的是找出setId 1和setId 2之间的添加/删除。

因此，diff的结果应为（例如）：

Rows that are in both setId1 and setId2
    1,1,user1,John

Rows that are in setId 1 but not in setId2
    2,1,user2,Sally
    3,1,user3,Tom

Rows that are in setId 2 but not in setId1
    5,2,user2,Thomas
    6,2,user4,Frank

我认为这就是所有细节。我想我得到了正确的例子。任何帮助，将不胜感激。 MySQL或PHP的解决方案对我来说很好。

最佳答案

您可以使用exists或not exists来获得两个或只有一组的行。

第1组中的用户，但没有第2组中的用户（相反的只是翻转表）：

select * from set1 s1
where set_id = 1
and not exists (
  select count(*) from set1 s2
  where s1.user1 = s2.user1
)

两组中的用户

select * from set2 s2
where set_id = 2
and exists (
    select 1 from set1 s1
    where s1.setId = 1
    and s2.user1 = s1.user1
)

如果在两个组中只希望有不同的用户，则group by user1：

select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(distinct set_id) = 2

或针对组中的用户，但不针对其他用户

select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(case when set_id <> 1 then 1 end) = 0

setid

php - 有效地获取大数据集的差异？