在PostgreSQL 9.5.3数据库中,我有一个引用credit_card_balances
表的persons
表,该表跟踪与特定人员关联的各种信用卡的余额:
CREATE TABLE persons (
id serial PRIMARY KEY,
name text
);
CREATE credit_card_balances (
id serial PRIMARY KEY,
card_provider text,
person int REFERENCES persons,
balance decimal,
timestamp timestamp
);
credit_card_balances
的示例行:id | card_provider | person | balance | timestamp
123 | visa | 1234 | 1.00 | 16-07-26 17:00
我需要检索同时拥有“visa”和“amex”卡的人的集合,以便“visa”卡上的最近余额大于“amex”卡上的最近余额。
对于每个
(person, card_provider)
行,表中最多可以有100行。理想情况下,输出列将是:person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp
我知道我可以做些
SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances
WHERE person=1234
ORDER BY card_provider, timestamp DESC;
为特定人员获取每张卡的最新余额。但我不确定如何对所有人都这么做,并核实上述情况,或者这是否是正确的方法。
编辑:正如回答中部分建议的那样,我也可以做一些类似的事情
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex'
AND b1.timestamp in
(SELECT MAX(time_stamp)
FROM credit_card_balances
WHERE card_provider = 'amex'))
AND (b2.card_provider = 'visa'
AND <... same as above>)
AND b1.balance > b2.balance;
但我注意到这会导致糟糕的表现。所以我认为这不是一个好的选择。
最佳答案
这个问题是两个经典的组合:greatest-n-per-group和relational-division。
考虑到您更新的规范,并且每个(person, card_provider)
大约有100行,我希望这个查询比我们目前拥有的要快得多:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM persons p
CROSS JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'amex' -- more selective credit card first to optimize
ORDER BY timestamp DESC
LIMIT 1
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'visa' -- 2nd cc
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance;
指数支撑至关重要。这是最理想的情况:
CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);
添加
balance
作为最后一个索引列只有在从中得到仅索引扫描时才有意义。这是假设
timestamp
被定义为NOT NULL
,否则您可能需要将needNULLS LAST
添加到查询和索引中。相关:
Optimize GROUP BY query to retrieve latest record per user
What is the difference between LATERAL and a subquery in PostgreSQL?
How to filter SQL results in a has-many-through relation
对于每个
(person, card_provider)
只有几行,使用DISTINCT ON
的方法可能更快。一张单独的桌子也没用。最佳状态取决于许多因素。假设至少有几张不同的信用卡。
persons
对于一张信用卡,aDISTINCT ON
子查询对于另一张信用卡:SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex' -- the more selective credit card first
ORDER BY person, timestamp DESC
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
AND person = a.person
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance
LATERAL
对于每张信用卡,请加入:SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex'
ORDER BY person, timestamp DESC
) a
JOIN (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
ORDER BY person, timestamp DESC
) v USING (person)
WHERE v.balance > a.balance;
或者,我的最爱:一个
DISTINCT ON
用于两张信用卡,然后使用DISTINCT ON
条件筛选聚合:SELECT person
, max(balance) FILTER (WHERE card_provider = 'amex') AS amex_balance
, max(balance) FILTER (WHERE card_provider = 'visa') AS visa_balance
, max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
, max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM (
SELECT DISTINCT ON (person, card_provider)
person, card_provider, balance, timestamp
FROM credit_card_balances
WHERE card_provider IN ('amex', 'visa')
ORDER BY person, card_provider, timestamp DESC
) c
GROUP BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
> max(balance) FILTER (WHERE card_provider = 'amex');
aggregate
HAVING
子句要求Postgres 9.4+:How can I simplify this game statistics query?
Select first row in each GROUP BY group?
关于sql - 选择一张信用卡的最新余额大于另一张信用卡的最新余额的人,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38598124/