SQL 查询以查找具有特定关联数的行

使用 Postgres 我有一个包含 conversations 和 conversationUsers 的模式。每个 conversation 都有许多 conversationUsers 。我希望能够找到具有完全指定数量的 conversationUsers 的对话。换句话说，如果提供了一组 userIds (例如 [1, 4, 6] )，我希望能够找到仅包含这些用户的对话，仅此而已。到目前为止，我已经尝试过这个:SELECT c."conversationId"FROM "conversationUsers" cWHERE c."userId" IN (1, 4)GROUP BY c."conversationId"HAVING COUNT(c."userId") = 2;不幸的是，这似乎也返回了包括这两个用户在内的对话。 (例如，如果对话还包含 "userId" 5，则返回结果)。最佳答案这是 relational-division 的一个案例 - 增加了一个特殊要求，即同一个对话不应有额外的用户。假设是表 "conversationUsers" 的 PK，它强制执行组合的唯一性，NOT NULL 并且还隐式地提供性能所必需的索引。多列PK的列按此顺序!否则你必须做更多。关于索引列的顺序: Is a composite index also good for queries on the first field? 对于基本查询，有 “蛮力” 方法来计算所有给定用户的所有对话的匹配用户数，然后过滤与所有给定用户匹配的那些。对于小表和/或只有短输入数组和/或每个用户的对话很少，但不能很好地扩展 :SELECT "conversationId"FROM "conversationUsers" cWHERE "userId" = ANY ('{1,4,6}'::int[])GROUP BY 1HAVING count(*) = array_length('{1,4,6}'::int[], 1)AND NOT EXISTS ( SELECT FROM "conversationUsers" WHERE "conversationId" = c."conversationId" AND "userId" <> ALL('{1,4,6}'::int[]) );使用 NOT EXISTS anti-semi-join 消除与其他用户的对话。更多的: How do I (or can I) SELECT DISTINCT on multiple columns? 替代技术: Select rows which are not present in other table 还有其他各种(更快) relational-division 查询技术。但是最快的不太适合动态数量的用户 ID。 How to filter SQL results in a has-many-through relation 对于快速查询也可以处理动态数量的用户 ID，请考虑 recursive CTE :WITH RECURSIVE rcte AS ( SELECT "conversationId", 1 AS idx FROM "conversationUsers" WHERE "userId" = ('{1,4,6}'::int[])[1] UNION ALL SELECT c."conversationId", r.idx + 1 FROM rcte r JOIN "conversationUsers" c USING ("conversationId") WHERE c."userId" = ('{1,4,6}'::int[])[idx + 1] )SELECT "conversationId"FROM rcte rWHERE idx = array_length(('{1,4,6}'::int[]), 1)AND NOT EXISTS ( SELECT FROM "conversationUsers" WHERE "conversationId" = r."conversationId" AND "userId" <> ALL('{1,4,6}'::int[]) );为了便于使用，将其包装在函数或 prepared statement 中。喜欢:PREPARE conversations(int[]) ASWITH RECURSIVE rcte AS ( SELECT "conversationId", 1 AS idx FROM "conversationUsers" WHERE "userId" = $1[1] UNION ALL SELECT c."conversationId", r.idx + 1 FROM rcte r JOIN "conversationUsers" c USING ("conversationId") WHERE c."userId" = $1[idx + 1] )SELECT "conversationId"FROM rcte rWHERE idx = array_length($1, 1)AND NOT EXISTS ( SELECT FROM "conversationUsers" WHERE "conversationId" = r."conversationId" AND "userId" <> ALL($1);称呼:EXECUTE conversations('{1,4,6}');dbfiddle here (也演示了一个函数)还有改进的空间:要获得最佳性能，您必须将对话最少的用户放在输入数组中，以便尽早消除尽可能多的行。为了获得最佳性能，您可以动态生成非动态、非递归查询(使用第一个链接中的一种快速技术)并依次执行。您甚至可以使用动态 SQL 将其包装在单个 plpgsql 函数中...更多解释: Using same column multiple times in WHERE clause 替代方案:用于稀疏写入表的 MV如果表 "conversationUsers" 大部分是只读的(旧对话不太可能改变)，您可以使用 MATERIALIZED VIEW 与排序数组中的预聚合用户，并在该数组列上创建一个普通的 btree 索引。CREATE MATERIALIZED VIEW mv_conversation_users ASSELECT "conversationId", array_agg("userId") AS users -- sorted arrayFROM ( SELECT "conversationId", "userId" FROM "conversationUsers" ORDER BY 1, 2 ) subGROUP BY 1ORDER BY 1;CREATE INDEX ON mv_conversation_users (users) INCLUDE ("conversationId");演示的覆盖索引需要 Postgres 11。请参阅: https://dba.stackexchange.com/a/207938/3684 关于对子查询中的行进行排序: How to apply ORDER BY and LIMIT in combination with an aggregate function? 在旧版本中，在 (users, "conversationId") 上使用普通的多列索引。对于非常长的数组，散列索引在 Postgres 10 或更高版本中可能有意义。那么更快的查询就是:SELECT "conversationId"FROM mv_conversation_users cWHERE users = '{1,4,6}'::int[]; -- sorted array!db fiddle here您必须权衡增加的存储、写入和维护成本与读取性能的好处。旁白:考虑没有双引号的合法标识符。 conversation_id 而不是 "conversationId" 等: Are PostgreSQL column names case-sensitive?关于SQL 查询以查找具有特定关联数的行，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/53235353/