我有一个这样的表:
Item Selected Session ID Created
A 1 2017-11-25T02:22:23
B 1 2017-11-25T02:22:24
B 1 2017-11-25T02:22:25
C 1 2017-11-25T02:22:17
D 1 2017-11-25T02:22:27
A 2 2017-11-25T02:22:28
C 2 2017-11-25T02:22:30
D 2 2017-11-25T02:22:06
我想知道,给定项目A,这是所有会话中在同一会话ID期间最有可能选择的下3-5个项目。
换句话说,在用户选择项目A之后,他们最常选择下一个项目是什么?
Item A查询的首选输出将类似于:
2nd Item Selected Percent of time selected
B 33%
C 33%
D 33%
在SQL中可能吗?
编辑:这是当前的解决方案,但在BigQuery中不起作用。我发布了确切的代码,但table_name已更改:
select `tag_touched`, count(*) / numsessions as ratio
from (select s.`session_id`, `tag_touched`, max(created) as
maxcreated, a.maxcreated_a, ss.numsessions
from [TABLENAME] s join
(select s.`session_id`, max(s.Created) as maxcreated_a
from [TABLENAME] s
where `tag_touched` = 'A'
group by s.`session_id`,
) a
on s.`session_id` = a.`session_id` cross join
(select count(distinct `session_id`) as numsessions
from [TABLENAME]
where `tag_touched` = 'A'
) ss
group by s.`session_id`, s.`tag_touched`, a.maxcreated_a, ss.numsessions
having max(created) > maxcreated_a
) s
group by `tag_touched`;
但是我返回了错误:
Error: Expression '`tag_touched`' is not present in the GROUP BY list
忠告?
最佳答案
您可以使用聚合获取在同一会话中给定项目之后选择了哪些项目的列表。我认为这可能足以满足您的需求:
select item, count(*) / numsessions as ratio
from (select s.sessionId, s.item, max(s.created) as maxcreated, a.maxcreated_a, ss.numsessions
from sessions s join
(select sessionId, max(created) as maxcreated_a
from sessions s
where item = 'A'
group by sessionId
) a
on s.sessionId = a.sessionId cross join
(select count(distinct sessionId) as numsessions
from sessions
where item = 'A'
) ss
group by s.sessionId, s.item, a.maxcreated_a, ss.numsessions
having max(created) > maxcreated_a -- appeared after the last "A"
) s
group by item;
关于mysql - 最常见的第二个查询(首先给出)-SQL分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49114739/