问题描述
我的数据看起来像
我想查找一个人在一天内有多少次相同的驱动器请求。我很想收到一张表格:
这是 group by 子句的结果。但是写一个这样的条件来检查是否有多少相同的请求在初始请求的24小时内是一般可行的?
现在我在Excel中下载数据并在那里执行数据,但有很多数据,因此效率不高......
示例数据:
让我们先建立一个样本数据集:
select * from(选择'Andy'作为名字,'Paris'作为f,'London'作为'2014-08-21 12:00'作为日期),
(选择'Lena'作为名字,'Koln'作为f ,'柏林','2014-08-22 18:00'为日期),
(选择'Andy'作为'Paris'作为f''伦敦'',2014-08- 22 06:00'as date),
(选择'Lisa'作为名字,'Rome'作为f'Neapel','2014-08-25 18:00'as date),
(选择'Lena'作为名称'Rome'作为f''London'',2014-08-21 20:00'作为日期),
(选择'Lisa'作为名称'Rome'作为f,'Neapel','2014-08-24 18:00'),
(选择'Andy'作为'Paris'作为f''London'','2014- 08-25 12:00'as date)
做到这一点是使用窗口函数与RANGE窗口的一种方法。为此,首先日期需要转换为天,因为RANGE需要排序列为连续编号。 PARTITION BY子句与GROUP BY类似 - 它列出了定义相同驱动器请求的列(在您的案例中,名称,从和到)。然后,您可以简单地使用COUNT(*)来计算此窗口中的天数。
select name,f,to,date ,计数(*)
超过(按名称分区,f到
按天排序
在1以前和1之后)从(
选择名称,f,到,日期,整数(时间戳(日期)/ 1000000/60/60/24)日
(选择'Andy'作为'Paris'作为f''London'',2014-08-21 12 :00'as date),
(选择'Lena'作为名字,'Koln'as f,'Berlin','2014-08-22 18:00'as date),
选择Andy作为名字,'Paris'作为f,'London'作为'2014-08-22 06:00'作为日期),
(选择'Lisa'作为名字'Rome'作为f ,'Neapel','2014-08-25 18:00'as date),
(选择'Lena'作为'罗马'作为f''伦敦'',2014-08- 21 20:00'as date),
(选择'Lisa'作为名字,'Rome'作为'Neapel','2014-08-24 18:00'as date),
(选择'Andy'作为名称,'Paris'为f,'London','2014-08-25 12:00'as date))
my data looks like
I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:
This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request?By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...
Sample data:
Let's build a sample dataset first:
select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)
One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.
select name, f, to, date, count(*)
over(partition by name, f, to
order by day
range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))
这篇关于BigQuery在一段时间间隔内选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!