我在PostgreSQL数据库中有以下两个表:

dummy=# select * from employee;
 id | name
----+-------
  1 | John
  2 | Susan
  3 | Jim
  4 | Sarah
(4 rows)

dummy=# select * from stats;
 id | arrival  |    day     | employee_id
----+----------+------------+-------------
  2 | 08:31:34 | monday     |           2
  4 | 08:15:00 | monday     |           3
  5 | 08:43:00 | monday     |           4
  1 | 08:34:00 | monday     |           1
  7 | 08:29:00 | midweek    |           1
  8 | 08:31:00 | midweek    |           2
  9 | 08:10:00 | midweek    |           3
 10 | 08:40:00 | midweek    |           4
 11 | 08:28:00 | midweek    |           1
 12 | 08:33:00 | midweek    |           2
 14 | 08:21:00 | midweek    |           3
 15 | 08:45:00 | midweek    |           4
 16 | 08:25:00 | midweek    |           1
 17 | 08:35:00 | midweek    |           2
 18 | 08:44:00 | midweek    |           4
 19 | 08:10:00 | friday     |           1
 20 | 08:40:00 | friday     |           2
 21 | 08:30:00 | friday     |           3
 22 | 08:30:00 | friday     |           4
(19 rows)

我想选择所有在8:258:35之间到达的员工。我可以通过以下查询相对容易地完成此任务:
SELECT * FROM stats
WHERE
    arrival >= (time '8:30' - interval '5 minutes')
AND
    arrival <= (time '8:30' + interval '5 minutes')
AND
    (day = 'midweek' or day = 'friday');

但是,另一个标准是,我只想选择那些在上述时间窗口内至少到达60%时间的员工。这就是我被困的地方。我不知道如何计算这个比率。
满足所有条件的查询是什么样子的?
澄清
显然,上述对比率的描述是误导性的。
在计算比率时,仅应考虑符合标准midweek的行。所以在样本数据中,john和susan四次出现在friday(day = 'midweek' or day = 'friday')上。四次中有三次是准时的。因此,苏珊和约翰的比率是midweek

最佳答案

使用公共表表达式计算所需计数,例如。

with in_time as (
    select *
    from stats
    where arrival >= (time '8:30' - interval '5 minutes')
    and arrival <= (time '8:30' + interval '5 minutes')
    and (day = 'midweek' or day = 'friday')
),
count_in_time as (
    select employee_id, count(*)
    from in_time
    group by employee_id
),
total_count as (
    select employee_id, count(*)
    from stats
    where day = 'midweek' or day = 'friday'
    group by employee_id
)
select
    i.*,
    c.count as in_time,
    t.count as total_count,
    round(c.count* 100.0/t.count, 2) as ratio
from in_time i
join count_in_time c using(employee_id)
join total_count t using(employee_id);

结果:
 id | arrival  |   day   | employee_id | in_time | total_count | ratio
----+----------+---------+-------------+---------+-------------+-------
 16 | 08:25:00 | midweek |           1 |       3 |           4 | 75.00
 11 | 08:28:00 | midweek |           1 |       3 |           4 | 75.00
  7 | 08:29:00 | midweek |           1 |       3 |           4 | 75.00
 17 | 08:35:00 | midweek |           2 |       3 |           4 | 75.00
 12 | 08:33:00 | midweek |           2 |       3 |           4 | 75.00
  8 | 08:31:00 | midweek |           2 |       3 |           4 | 75.00
 21 | 08:30:00 | friday  |           3 |       1 |           3 | 33.33
 22 | 08:30:00 | friday  |           4 |       1 |           4 | 25.00
(8 rows)

您可以在最终查询的where子句中添加适当的条件。
如果只想获取员工及其比率的聚合数据,请将count()与filter一起使用:
select employee_id, name, in_time* 1.0/ total as ratio
from (
    select
        employee_id,
        count(*) filter (where arrival >= time '8:30' - interval '5 minutes' and arrival <= time '8:30' + interval '5 minutes') as in_time,
        count(*) as total
    from stats
    where day in ('midweek', 'friday')
    group by 1
    ) s
join employee e on e.id = s.employee_id
where in_time* 1.0/ total >= 0.6;

 employee_id | name  |         ratio
-------------+-------+------------------------
           1 | John  | 0.75000000000000000000
           2 | Susan | 0.75000000000000000000
(2 rows)

关于database - Postgres:计算匹配条件的表条目的比率,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47079995/

10-11 15:18