聚合函数来检测PostgreSQL中的趋势

聚合函数来检测PostgreSQL中的趋势

本文介绍了聚合函数来检测PostgreSQL中的趋势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用psql db来存储数据结构,如下所示:

I'm using a psql db to store a datastructure like so:

datapoint(userId, rank, timestamp)

其中timestamp是Unix Epoch毫秒时间戳。

where timestamp is the Unix Epoch milliseconds timestamp.

在此结构中,我每天存储每个用户的排名,因此它是:

In this structure I store the rank of each user each day, so it's like:

UserId   Rank  Timestamp
1        1     1435366459
1        2     1435366458
1        3     1435366457
2        8     1435366456
2        6     1435366455
2        7     1435366454

因此,在上面的示例数据中,userId 1改进了它与每个测量的排名,这意味着它有一个积极的趋势,而userId 2 ,这意味着它有一个负趋势。

So, in the sample data above, userId 1 its improving it's rank with each measurement, which means it has a positive trend, while userId 2 is dropping in rank, which means it has a negative trend.

我需要做的是检测基于最后N次测量有积极趋势的所有用户。

What i need to do is to detect all users that have a positive trend based on the last N measurements.

任何人都可以提供一些提示如何开始/接近这个任务?
谢谢!

Can anyone provide some tips how to start/approach this task?thanks!

推荐答案

一种方法是对每个用户的排名执行线性回归,斜率为正或负。幸运的是,PostgreSQL有一个内置函数来做 - regr_slope

One approach would be to perform a linear regression on the each user's rank, and check if the slope is positive or negative. Luckily, PostgreSQL has a builtin function to do that - regr_slope:

SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
FROM     my_table
GROUP BY user_id

此查询为您提供基本功能。现在,如果你喜欢,你可以用 case 表达式来装饰它:

This query gives you the basic functionality. Now, you can dress it up a bit with case expressions if you like:

SELECT user_id,
       CASE WHEN slope > 0 THEN 'positive'
            WHEN slope < 0 THEN 'negative'
            ELSE 'steady' END AS trend
FROM   (SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     my_table
        GROUP BY user_id) t

编辑:

不幸的是, regr_slope 没有内置的方式来处理top N类型的需求,所以这应该单独处理,例如通过 row_number


Unfortunately, regr_slope doesn't have a built in way to handle "top N" type requirements, so this should be handled separately, e.g., by a subquery with row_number:

-- Decoration outer query
SELECT user_id,
       CASE WHEN slope > 0 THEN 'positive'
            WHEN slope < 0 THEN 'negative'
            ELSE 'steady' END AS trend
FROM   (-- Inner query to calculate the slope
        SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     (-- Inner query to get top N
                  SELECT user_id, rank1,
                         ROW_NUMER() OVER (PARTITION BY user_id
                                           ORDER BY timestamp1 DESC) AS rn
                  FROM   my_table) t
        WHERE    rn <= N -- Replace N with the number of rows you need
        GROUP BY user_id) t2

这篇关于聚合函数来检测PostgreSQL中的趋势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 02:57