sql - 查找看过相同节目的观众(每个观众匹配多行)

对于分配，我必须为运行PostgreSQL 9.3.0的PostgreSQL服务器中存储的数据库编写几个SQL查询。然而，我发现自己被最后一个查询阻塞了。这个数据库模拟了一个歌剧院的预订系统。这个查询是关于将一个旁观者与每次协助相同事件的其他旁观者关联起来。
模型如下：

Reservations table
id_res |     create_date     |  tickets_presented  | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
     1 | 2015-08-05 17:45:03 |                     |       1 |            1 |   195 |        1
     2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 |      11 |            1 |   150 |        2

Spectators table

id_spectator   | last_name  | first_name |                email                   |     create_time     | age
---------------+------------+------------+----------------------------------------+---------------------+-----
             1 | gonzalez   | colin      | [email protected]               | 2014-03-15 14:21:30 |  22
             2 | bequet     | camille    | [email protected]               | 2014-12-10 15:22:31 |  22

Shows table
 id_show |          name          |  kind  | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
       1 | madama butterfly       | opera  | 2015-09-05        | 19:30:00   | 21:30:00 |         2 |           315 |           630 |           945 |        195 |        150 |        100
       2 | don giovanni           | opera  | 2015-09-12        | 19:30:00   | 21:45:00 |         2 |           315 |           630 |           945 |        195 |        150 |        100

到目前为止，我已经开始编写一个查询，以获取观众的id和他正在观看的节目的日期，查询如下所示。

SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;

能不能有人帮我更好地理解这个问题，并提示我找到解决办法。提前谢谢。
所以我期望的结果应该是这样的

id_spectator | other_id_spectators
-------------+--------------------
            1|                 2,3

也就是说，每次id为1的观众去看演出，2号和3号观众也去。

最佳答案

也就是说，每次id为1的观众去看演出，2号和3号观众也去。
换句话说，你想要一份。。。
所有看过某个观众看过的所有节目的观众（可能比给定的还要多）
这是关系划分的一个特例。我们在这里汇集了一系列基本技术：
How to filter SQL results in a has-many-through relation
这是特别的，因为每个观众必须参加的节目列表是由给定的主要观众动态确定的。
假设(d_spectator, id_show)在reservations中是唯一的，这一点尚未澄清。
对这两列的UNIQUE约束（按顺序）也提供了最重要的索引。
为了在下面的查询2和查询3中获得最佳性能，还可以创建一个前导id_show的索引。
一。蛮力
基本方法是形成一个有序的数组，其中显示给定用户看到的内容，并比较其他用户看到的相同数组：

SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM  (
   SELECT id_spectator
   FROM   reservations r
   WHERE  id_spectator <> 1
   GROUP  BY 1
   HAVING        array_agg(id_show ORDER BY id_show)
      @> (SELECT array_agg(id_show ORDER BY id_show)
          FROM   reservations
          WHERE  id_spectator = 1)
   ) sub;

但这对大桌子来说可能非常昂贵。整张桌子都要经过加工，而且价格也相当昂贵。
2。更聪明的
使用aCTE来确定相关的节目，然后只考虑那些

WITH shows AS (             -- all shows of id 1; 1 row per show
   SELECT id_spectator, id_show
   FROM   reservations
   WHERE  id_spectator = 1  -- your prime spectator here
   )
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM  (
   SELECT s.id_spectator, r.id_spectator AS other
   FROM   shows s
   JOIN   reservations r USING (id_show)
   WHERE  r.id_spectator <> s.id_spectator
   GROUP  BY 1,2
   HAVING count(*) = (SELECT count(*) FROM shows)
   ) sub
GROUP  BY 1;

@> is the "contains2 operator for arrays-所以我们吸引了所有至少看过相同节目的观众。
比1快。因为只考虑相关节目。
三。真聪明
要同时从查询中排除不符合条件的观众，请使用recursive CTE：

WITH RECURSIVE shows AS (   -- produces exactly 1 row
   SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
   FROM   reservations
   WHERE  id_spectator = 1  -- your prime spectator here
   GROUP  BY 1
   )
, cte AS (
   SELECT r.id_spectator, 1 AS idx
   FROM   shows s
   JOIN   reservations r ON r.id_show = s.shows[1]
   WHERE  r.id_spectator <> s.id_spectator

   UNION  ALL
   SELECT r.id_spectator, idx + 1
   FROM   cte c
   JOIN   reservations r USING (id_spectator)
   JOIN   shows s ON s.shows[c.idx + 1] = r.id_show
   )
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM   shows s
JOIN   cte c ON c.idx = s.ct  -- has an entry for every show
GROUP  BY 1;

注意，第一个CTE是非递归的。只有第二部分是递归的（实际上是迭代的）。
对于从大表中进行小选择，这应该是最快的。不符合条件的行将被提前排除。我提到的两个指数是必不可少的。
SQL Fiddle演示这三种方法。