本文介绍了如何使用Presto将时间列分为5秒间隔并计数行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用Presto和Zeppelin.有很多原始数据.我必须总结这些数据.
I am using Presto and Zeppelin. There are a lot of raw datas.I have to summarize those datas.
我想每5秒分组一次.
serviceType logType date
------------------------------------------------------
service1 log1 2017-10-24 23:00:23.206
service1 log1 2017-10-24 23:00:23.207
service1 log1 2017-10-24 23:00:25.206
service2 log1 2017-10-24 23:00:24.206
service1 log2 2017-10-24 23:00:27.206
service1 log2 2017-10-24 23:00:29.302
然后结果
serviceType logType date cnt
--------------------------------------------------------------
service1 log1 2017-10-24 23:00:20 2
service2 log1 2017-10-24 23:00:20 1
service1 log1 2017-10-24 23:00:25 1
service1 log2 2017-10-24 23:00:25 2
首先,我必须将存储的数据迁移到新表中.
first, I have to migrate stored datas to new tables.
第二,我必须对数据进行分组并实时保存到新表中.
second, I have to group datas and save to the new table realtime.
很难编写sql脚本.
请帮助我.
我必须使用python解释器吗?
Do I have to use python interpreter?
推荐答案
您可以
- 使用
date_trunc
丢弃timestamp
的毫秒部分 - 您可以使用
ts - interval '1' second * (second(ts) % 5)
将不带毫秒部分的timestamp
舍入为5秒
- discard millisecond part of a
timestamp
withdate_trunc
- you can round a
timestamp
without millisecond part to 5 seconds withts - interval '1' second * (second(ts) % 5)
将它们放在一起的示例:
Example putting this together:
presto> SELECT ts_rounded, count(*)
-> FROM (
-> SELECT date_trunc('second', ts) - interval '1' second * (second(ts) % 5) AS ts_rounded
-> FROM (VALUES timestamp '2017-10-24 23:01:20.206',
-> timestamp '2017-10-24 23:01:23.206',
-> timestamp '2017-10-24 23:01:23.207',
-> timestamp '2017-10-24 23:01:26.206') AS t(ts)
-> )
-> GROUP BY ts_rounded ORDER BY ts_rounded;
ts_rounded | _col1
-------------------------+-------
2017-10-24 23:01:20.000 | 3
2017-10-24 23:01:25.000 | 1
(2 rows)
这篇关于如何使用Presto将时间列分为5秒间隔并计数行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!