问题描述
我跟踪网络访问者.我存储了IP地址以及访问的时间戳.
I track web visitors. I store the IP address as well as the timestamp of the visit.
ip_address time_stamp
180.2.79.3 1301654105
180.2.79.3 1301654106
180.2.79.3 1301654354
180.2.79.3 1301654356
180.2.79.3 1301654358
180.2.79.3 1301654366
180.2.79.3 1301654368
180.2.79.3 1301654422
我有一个查询以获取总曲目:
I have a query to get total tracks:
SELECT COUNT(*) AS tracks FROM tracking
但是,我现在要忽略每次访问后10秒内多次访问过的用户的访问.由于我不认为这是另一次访问,因此它仍然是第一次访问的一部分.
However, I now want to disregard visits from users that have visited multiple times within 10 seconds of each visit. Since I don't consider this another visit, its still part of the first visit.
我很难将其放入SQL查询表单中,对此我将不胜感激!
I am having difficulty in putting this into a SQL query form, I would appreciate any help on this!
推荐答案
让我从此表开始.我将使用普通时间戳记,以便我们可以轻松了解发生了什么.
Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.
180.2.79.3 2011-01-01 08:00:00
180.2.79.3 2011-01-01 08:00:09
180.2.79.3 2011-01-01 08:00:20
180.2.79.3 2011-01-01 08:00:23
180.2.79.3 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:40
180.2.79.4 2011-01-01 08:00:00
180.2.79.4 2011-01-01 08:00:13
180.2.79.4 2011-01-01 08:00:23
180.2.79.4 2011-01-01 08:00:25
180.2.79.4 2011-01-01 08:00:27
180.2.79.4 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:50
如果我对您的理解正确,那么您想这样算.
If I understand you correctly, you want to count these like this.
180.2.79.3 3
180.2.79.4 3
您可以通过选择两个的最大时间戳来为每个ip_address做到这一点
You can do that for each ip_address by selecting the maximum timestamp that is both
- 大于当前行的时间戳记和
- 比当前行的时间戳大不超过10秒.
- greater than the current row'stimestamp, and
- less than or equal to 10 seconds greater than the current row's timestamp.
将这两个条件结合起来会引入一些空值,这些空值确实非常有用.
Taking these two criteria together will introduce some nulls, which turn out to be really useful.
select ip_address,
t_s.time_stamp,
(select max(t.time_stamp)
from t_s t
where t.ip_address = t_s.ip_address
and t.time_stamp > t_s.time_stamp
and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;
ip_address time_stamp next_page
180.2.79.3 2011-01-01 08:00:00 2011-01-01 08:00:09
180.2.79.3 2011-01-01 08:00:09 <null>
180.2.79.3 2011-01-01 08:00:20 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:23 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:25 <null>
180.2.79.3 2011-01-01 08:00:40 <null>
180.2.79.4 2011-01-01 08:00:00 <null>
180.2.79.4 2011-01-01 08:00:13 2011-01-01 08:00:23
180.2.79.4 2011-01-01 08:00:23 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:25 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:27 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:29 <null>
180.2.79.4 2011-01-01 08:00:50 <null>
标记访问结束的时间戳记其自己的next_page为空.这是因为该行的时间戳不小于或等于time_stamp + 10秒.
The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.
要获得计数,我可能会创建一个视图并计算空值.
To get a count, I'd probably create a view and count the nulls.
select ip_address, count(*)
from t_s_visits
where next_page is null
group by ip_address
180.2.79.3 3
180.2.79.4 3
这篇关于计算彼此相距10秒以内的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!