问题描述
我使用psql db来存储数据结构,如下所示:
I'm using a psql db to store a datastructure like so:
datapoint(userId, rank, timestamp)
其中timestamp是Unix Epoch毫秒时间戳。
where timestamp is the Unix Epoch milliseconds timestamp.
在此结构中,我每天存储每个用户的排名,因此它是:
In this structure I store the rank of each user each day, so it's like:
UserId Rank Timestamp
1 1 1435366459
1 2 1435366458
1 3 1435366457
2 8 1435366456
2 6 1435366455
2 7 1435366454
因此,在上面的示例数据中,userId 1改进了它与每个测量的排名,这意味着它有一个积极的趋势,而userId 2 ,这意味着它有一个负趋势。
So, in the sample data above, userId 1 its improving it's rank with each measurement, which means it has a positive trend, while userId 2 is dropping in rank, which means it has a negative trend.
我需要做的是检测基于最后N次测量有积极趋势的所有用户。
What i need to do is to detect all users that have a positive trend based on the last N measurements.
任何人都可以提供一些提示如何开始/接近这个任务?
谢谢!
Can anyone provide some tips how to start/approach this task?thanks!
推荐答案
一种方法是对每个用户的排名执行线性回归,斜率为正或负。幸运的是,PostgreSQL有一个内置函数来做 - regr_slope
:
One approach would be to perform a linear regression on the each user's rank, and check if the slope is positive or negative. Luckily, PostgreSQL has a builtin function to do that - regr_slope
:
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id
此查询为您提供基本功能。现在,如果你喜欢,你可以用 case
表达式来装饰它:
This query gives you the basic functionality. Now, you can dress it up a bit with case
expressions if you like:
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id) t
编辑:
不幸的是, regr_slope
没有内置的方式来处理top N类型的需求,所以这应该单独处理,例如通过 row_number
:
Unfortunately, regr_slope
doesn't have a built in way to handle "top N" type requirements, so this should be handled separately, e.g., by a subquery with row_number
:
-- Decoration outer query
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (-- Inner query to calculate the slope
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM (-- Inner query to get top N
SELECT user_id, rank1,
ROW_NUMER() OVER (PARTITION BY user_id
ORDER BY timestamp1 DESC) AS rn
FROM my_table) t
WHERE rn <= N -- Replace N with the number of rows you need
GROUP BY user_id) t2
这篇关于聚合函数来检测PostgreSQL中的趋势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!