本文介绍了使用SQL计算最长的狂欢观看连胜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试处理一些观看狂欢的统计信息,我想找出最长狂欢连胜期( binge 是被多个程序查看了让步,相距不超过2小时)。数据如下所示:

I'm attempting to crunch some binge-viewing stats and I'd like to find out how long the longest binge streak is (a binge being multiple programs viewed in concession, one after another, no more than 2 hours apart). The data looks like this:

datetime                user_id program
2013-09-01 00:01:18     1       A
2013-09-10 14:03:14     1       B
2013-09-20 17:02:12     2       A  
2013-09-21 00:03:22     2       C  <-- user 2 binge start
2013-09-21 01:23:22     2       M
2013-09-21 03:03:22     2       E
2013-09-21 04:03:22     2       F  
2013-09-21 06:03:22     2       G  <-- user 2 binge end
2013-09-21 09:03:22     2       H
2013-09-03 18:21:09     3       D
2013-09-21 09:03:22     2       H
2013-09-24 19:21:00     2       X  <-- user 2 second binge start
2013-09-24 20:21:00     2       Y
2013-09-24 21:21:00     2       Z  <-- user 2 second binge end

在此示例中,用户2持续了6个小时的狂欢,后来又持续了2小时。

In this example user 2 had a binge that lasted 6 hours and later another that lasted 2 hours.

T他想要的最终结果是:

The end result I would like is something like:

user_id     binge     length
2           1         6 hours
2           2         2 hours

可以直接在数据库中计算吗?

Can this be calculated directly in the database?

推荐答案

这是识别数据中的序列/条纹的问题。我的首选方式是

This is a problem of identifying sequences/streak in the data. My preferred way of doing this is,


  • 使用LAG函数来识别每个条纹的开始

  • 使用SUM函数为每个条纹分配一个唯一的数字

  • 然后按该唯一的数字分组以进行进一步处理

查询:

with start_grp as (
  select dt, user_id, programme,
         case when dt - lag(dt,1) over (partition by user_id order by dt) 
                   > interval '0 day 2:00:00'
              then 1
              else 0
         end grp_start
  from binge
  ),
assign_grp as (
  select dt, user_id, programme,
  sum(grp_start) over (partition by user_id order by dt) grp
  from start_grp)
select user_id, grp as binge, max(dt) - min(dt) as binge_length
from assign_grp
group by user_id, grp
having count(programme) > 1

这里的狂欢列可能不是顺序出现的。您可以在最终查询中使用ROW_NUMBER函数进行更正。

Here binge column may not come in sequential manner. You can use ROW_NUMBER function over the final query to correct it.

Demo位于

Demo at

这篇关于使用SQL计算最长的狂欢观看连胜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 09:00