在PostgreSQL 9.5.3数据库中,我有一个引用credit_card_balances表的persons表,该表跟踪与特定人员关联的各种信用卡的余额:

CREATE TABLE persons (
  id serial PRIMARY KEY,
  name text
);

CREATE credit_card_balances (
  id serial PRIMARY KEY,
  card_provider text,
  person int REFERENCES persons,
  balance decimal,
  timestamp timestamp
);

credit_card_balances的示例行:
id  |  card_provider | person  | balance | timestamp
123 |  visa          | 1234    | 1.00    | 16-07-26 17:00

我需要检索同时拥有“visa”和“amex”卡的人的集合,以便“visa”卡上的最近余额大于“amex”卡上的最近余额。
对于每个(person, card_provider)行,表中最多可以有100行。理想情况下,输出列将是:
person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp

我知道我可以做些
SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances
WHERE person=1234
ORDER BY card_provider, timestamp DESC;

为特定人员获取每张卡的最新余额。但我不确定如何对所有人都这么做,并核实上述情况,或者这是否是正确的方法。
编辑:正如回答中部分建议的那样,我也可以做一些类似的事情
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex'
     AND b1.timestamp in
        (SELECT MAX(time_stamp)
         FROM credit_card_balances
         WHERE card_provider = 'amex'))

AND (b2.card_provider = 'visa'
     AND <... same as above>)
AND b1.balance > b2.balance;

但我注意到这会导致糟糕的表现。所以我认为这不是一个好的选择。

最佳答案

这个问题是两个经典的组合:greatest-n-per-grouprelational-division
考虑到您更新的规范,并且每个(person, card_provider)大约有100行,我希望这个查询比我们目前拥有的要快得多:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM   persons p
CROSS  JOIN LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances
   WHERE  person = p.id
   AND    card_provider = 'amex'  -- more selective credit card first to optimize
   ORDER  BY timestamp DESC
   LIMIT  1
   ) a
JOIN   LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances
   WHERE  person = p.id
   AND    card_provider = 'visa'  -- 2nd cc
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance;

指数支撑至关重要。这是最理想的情况:
CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);

添加balance作为最后一个索引列只有在从中得到仅索引扫描时才有意义。
这是假设timestamp被定义为NOT NULL,否则您可能需要将needNULLS LAST添加到查询和索引中。
相关:
Optimize GROUP BY query to retrieve latest record per user
What is the difference between LATERAL and a subquery in PostgreSQL?
How to filter SQL results in a has-many-through relation
对于每个(person, card_provider)只有几行,使用DISTINCT ON的方法可能更快。一张单独的桌子也没用。最佳状态取决于许多因素。
假设至少有几张不同的信用卡。
persons对于一张信用卡,aDISTINCT ON子查询对于另一张信用卡:
SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances
   WHERE  card_provider = 'amex'  -- the more selective credit card first
   ORDER  BY person, timestamp DESC
   ) a
JOIN  LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances
   WHERE  card_provider = 'visa'
   AND    person = a.person
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance

LATERAL对于每张信用卡,请加入:
SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances
   WHERE  card_provider = 'amex'
   ORDER  BY person, timestamp DESC
   ) a
JOIN  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances
   WHERE  card_provider = 'visa'
   ORDER  BY person, timestamp DESC
   ) v USING (person)
WHERE  v.balance > a.balance;

或者,我的最爱:一个DISTINCT ON用于两张信用卡,然后使用DISTINCT ON条件筛选聚合:
SELECT person
     , max(balance)   FILTER (WHERE card_provider = 'amex') AS amex_balance
     , max(balance)   FILTER (WHERE card_provider = 'visa') AS visa_balance
     , max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
     , max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person, card_provider)
          person, card_provider, balance, timestamp
   FROM   credit_card_balances
   WHERE  card_provider IN ('amex', 'visa')
   ORDER  BY person, card_provider, timestamp DESC
   ) c
GROUP  BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
     > max(balance) FILTER (WHERE card_provider = 'amex');

aggregateHAVING子句要求Postgres 9.4+:
How can I simplify this game statistics query?
Select first row in each GROUP BY group?

关于sql - 选择一张信用卡的最新余额大于另一张信用卡的最新余额的人,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38598124/

10-13 00:41