T-SQL:计算直到首次成功的失败次数

本文介绍了T-SQL:计算直到首次成功的失败次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含时间戳的事件的数据库:

I have a DB consisting of events with timestamps:

row eventName taskName timestamp userName
1   fail      ABC      10.5      John
2   fail      ABC      18.0      John
3   fail      ABC      19.0      Mike
4   fail      XYZ      21.0      John
5   fail      XYZ      23.0      Mike
6   success   ABC      25.0      John
7   fail      ABC      26.0      John
8   success   ABC      28.0      John

我想计算的是每个用户在获得首次成功之前的失败次数(并且是平均值，但这超出了这个问题).

I'd like to count the number of failures until the first success, per user (and average, but that's beyond this question).

在上面的示例中，John尝试了两次ABC任务(第1行和第2行)，直到成功为止(第6行).随后的失败和成功可以忽略.

In the example above, John attempted task ABC 2 times (rows 1 & 2) until succeeding (row 6). Subsequent failures and successes can be ignored.

我想我可以通过对时间戳为"ABC"和失败"的行数早于所有时间戳为"ABC"和"success"的所有行中最早的时间戳进行计数来实现此目的，并按userName分组.如何在T-SQL中表达这一点?具体来说就是Vertica.

I think I could achieve this by counting the number of rows with "ABC" and "fail" whose timestamp is earlier than the earliest timestamp in all rows with "ABC" and "success", grouping by userName. How do I express this in T-SQL? Specifically, Vertica.

这似乎与这里的情况非常相似: sql计数/总和直到另一列中的特定日期为止的呼叫次数

This seemed to be very similar to the case here:sql count/sum the number of calls until a specific date in another column

但是当我尝试将 https://stackoverflow.com/a/39594686/4354459 中的代码改编为在下面，我认为我出了点问题，因为我的人数不断超过预期.

But when I tried to adapt the code from https://stackoverflow.com/a/39594686/4354459 as below, I think I got something wrong because I keep getting larger counts than expected.

WITH
Successes
AS
(
    SELECT
        events.userName
        ,events.taskName
        ,MIN(events.timestamp) AS FirstSuccessTime
    FROM events
    WHERE events.eventName = 'success'
    GROUP BY events.userName, events.taskName
)
SELECT
    events.userName
    ,events.taskName
    ,COUNT(events.eventName) AS FailuresUntilFirstSuccess
FROM
    Successes
    LEFT JOIN events
        ON  events.taskName = Successes.taskName
        AND events.timestamp < Successes.FirstSuccessTime
        AND events.eventName = 'fail'
GROUP BY events.userName, events.taskName
;

推荐答案

使用CTEs的另一种解决方案.这个想法是先接收success的min行.之后，我们可以使用简单的条件过滤掉其他行，例如每个用户row < min:

One more solutions using CTEs. The idea is to receive min row with success first. After that we can filter out other rows using simple condition, like row < min per user:

DECLARE @events TABLE([row] INT,eventName VARCHAR(50),taskName VARCHAR(50),userName VARCHAR(50))

INSERT INTO @events
        VALUES(1,'fail','ABC','John')
       ,(2,'fail','ABC','John')
       ,(3,'fail','ABC','Mike')
       ,(4,'fail','XYZ','John')
       ,(5,'fail','XYZ','Mike')
       ,(6,'success','ABC','John')
       ,(7,'fail','ABC','John')
       ,(8,'success','ABC','John')
       ,(9,'success','ABC','Mike')


SELECT * FROM @events;

WITH
    cte
    AS (
            SELECT userName
                    ,taskName
                  ,MIN(row) AS [min]
            FROM @events
            WHERE eventName = 'success'
            GROUP BY userName,taskName)
SELECT e.userName
        ,e.taskName
      ,COUNT(1) AS attempts
FROM @events e
JOIN cte c ON e.userName = c.userName
WHERE e.row < c.[min]
GROUP BY e.userName, e.taskName

这篇关于T-SQL:计算直到首次成功的失败次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！