问题描述
我有一个包含时间戳的事件的数据库:
I have a DB consisting of events with timestamps:
row eventName taskName timestamp userName
1 fail ABC 10.5 John
2 fail ABC 18.0 John
3 fail ABC 19.0 Mike
4 fail XYZ 21.0 John
5 fail XYZ 23.0 Mike
6 success ABC 25.0 John
7 fail ABC 26.0 John
8 success ABC 28.0 John
我想计算的是每个用户在获得首次成功之前的失败次数(并且是平均值,但这超出了这个问题).
I'd like to count the number of failures until the first success, per user (and average, but that's beyond this question).
在上面的示例中,John尝试了两次ABC任务(第1行和第2行),直到成功为止(第6行).随后的失败和成功可以忽略.
In the example above, John attempted task ABC 2 times (rows 1 & 2) until succeeding (row 6). Subsequent failures and successes can be ignored.
我想我可以通过对时间戳为"ABC"和失败"的行数早于所有时间戳为"ABC"和"success"的所有行中最早的时间戳进行计数来实现此目的,并按userName分组.如何在T-SQL中表达这一点?具体来说就是Vertica.
I think I could achieve this by counting the number of rows with "ABC" and "fail" whose timestamp is earlier than the earliest timestamp in all rows with "ABC" and "success", grouping by userName. How do I express this in T-SQL? Specifically, Vertica.
这似乎与这里的情况非常相似: sql计数/总和直到另一列中的特定日期为止的呼叫次数
This seemed to be very similar to the case here:sql count/sum the number of calls until a specific date in another column
但是当我尝试将 https://stackoverflow.com/a/39594686/4354459 中的代码改编为在下面,我认为我出了点问题,因为我的人数不断超过预期.
But when I tried to adapt the code from https://stackoverflow.com/a/39594686/4354459 as below, I think I got something wrong because I keep getting larger counts than expected.
WITH
Successes
AS
(
SELECT
events.userName
,events.taskName
,MIN(events.timestamp) AS FirstSuccessTime
FROM events
WHERE events.eventName = 'success'
GROUP BY events.userName, events.taskName
)
SELECT
events.userName
,events.taskName
,COUNT(events.eventName) AS FailuresUntilFirstSuccess
FROM
Successes
LEFT JOIN events
ON events.taskName = Successes.taskName
AND events.timestamp < Successes.FirstSuccessTime
AND events.eventName = 'fail'
GROUP BY events.userName, events.taskName
;
推荐答案
使用CTEs
的另一种解决方案.这个想法是先接收success
的min行.之后,我们可以使用简单的条件过滤掉其他行,例如每个用户row < min
:
One more solutions using CTEs
. The idea is to receive min row with success
first. After that we can filter out other rows using simple condition, like row < min
per user:
DECLARE @events TABLE([row] INT,eventName VARCHAR(50),taskName VARCHAR(50),userName VARCHAR(50))
INSERT INTO @events
VALUES(1,'fail','ABC','John')
,(2,'fail','ABC','John')
,(3,'fail','ABC','Mike')
,(4,'fail','XYZ','John')
,(5,'fail','XYZ','Mike')
,(6,'success','ABC','John')
,(7,'fail','ABC','John')
,(8,'success','ABC','John')
,(9,'success','ABC','Mike')
SELECT * FROM @events;
WITH
cte
AS (
SELECT userName
,taskName
,MIN(row) AS [min]
FROM @events
WHERE eventName = 'success'
GROUP BY userName,taskName)
SELECT e.userName
,e.taskName
,COUNT(1) AS attempts
FROM @events e
JOIN cte c ON e.userName = c.userName
WHERE e.row < c.[min]
GROUP BY e.userName, e.taskName
这篇关于T-SQL:计算直到首次成功的失败次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!