本文介绍了BigQuery 选择时间间隔内的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据看起来像

姓名|来自 |To_City |请求日期

安迪|巴黎 |伦敦|08/21/2014 12:00

Andy| Paris | London| 08/21/2014 12:00

莉娜 |科隆 |柏林 |08/22/2014 18:00

Lena | Koln | Berlin | 08/22/2014 18:00

安迪|巴黎 |伦敦 |08/22/2014 06:00

Andy| Paris | London | 08/22/2014 06:00

丽莎 |罗马 |尼佩尔 |08/25/2014 18:00

Lisa | Rome | Neapel | 08/25/2014 18:00

莉娜 |罗马 |伦敦 |08/21/2014 20:00

Lena | Rome | London | 08/21/2014 20:00

丽莎 |罗马 |尼佩尔 |2014/08/24 18:00

Lisa | Rome | Neapel | 08/24/2014 18:00

安迪|巴黎 |伦敦|2014/08/25 12:00

Andy| Paris | London| 08/25/2014 12:00

我想找出一个人在 +/- 天内有多少相同的驱动器请求.我很想收到一张桌子,上面写着:

I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:

姓名|来自 |To_City |请求的平均日期 |# 个请求

安迪|巴黎 |伦敦|08/21/2014 21:00 |2

Andy| Paris | London| 08/21/2014 21:00 | 2

莉娜 |科隆 |柏林 |08/22/2014 18:00 |1

Lena | Koln | Berlin | 08/22/2014 18:00 | 1

丽莎 |罗马 |尼佩尔 |08/25/2014 06:00 |2

Lisa | Rome | Neapel | 08/25/2014 06:00 | 2

莉娜 |罗马 |伦敦 |08/21/2014 20:00 |1

Lena | Rome | London | 08/21/2014 20:00 | 1

安迪|巴黎 |伦敦|08/25/2014 12:00 |1

Andy| Paris | London| 08/25/2014 12:00 | 1

这将是 group by 子句的结果.但是编写这样一个条件来检查在初始请求的 24 小时内是否有相同的请求以及有多少相同的请求通常是否可行?现在我在 Excel 中下载数据并在那里执行,但数据很多,因此效率不高...

This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request?By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...

示例数据:

让我们先构建一个示例数据集:

Let's build a sample dataset first:

select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)

推荐答案

一种方法是在 RANGE 窗口中使用窗口函数.为此,需要将第一个日期转换为天,因为 RANGE 要求排序列是连续数字.PARTITION BY 子句类似于 GROUP BY - 它列出了定义相同"驱动器请求的列(在您的情况下 - 名称、来自和到).然后你可以简单地使用 COUNT(*) 来计算这个窗口内的天数.

One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.

select name, f, to, date, count(*)
  over(partition by name, f, to
       order by day
       range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))

这篇关于BigQuery 选择时间间隔内的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 23:55