我有一个数据库表,该表基本上具有列date Date, int UserId, double Value
。
我希望能够执行一个查询,为所有用户提供每个日期的价值的10%和90%百分位数,例如SELECT Date, Pct10(Value), Pct90(Value) from Table group by Date
。
我知道使用Count(*)
和LIMIT
在MySQL中计算百分位数的不同方法,但对行进行计数,但是,我看不到如何对一个语句中的每个日期值迭代地应用此方法。
示例数据:
Date | UserId | Value
2013-01-01 | 0 | 1
2013-01-01 | 1 | 1
2013-01-01 | 2 | 1
2013-01-01 | 3 | 1
2013-01-01 | 4 | 2
2013-01-01 | 5 | 2
2013-01-01 | 6 | 2
2013-01-01 | 7 | 2
2013-01-01 | 8 | 2
2013-01-01 | 9 | 2
2013-01-01 | 10 | 9
2013-01-02 | 1 | 1
2013-01-02 | 9 | 1
预期的结果将是
Date | Pct10 | Pct90
2013-01-01 | 1 | 2
2013-01-02 | 1 | 1
最佳答案
我不确定要获取百分比。我正在使用基于以下select nth percentile from mysql的子查询,但是我不确定我是否正确修改了它。我的答案是结合子查询。
下面的查询会变慢,并且随着表的增加而变慢,但是它应该可以满足您的需求:
SELECT p10.Date, Pct10, Pct90
FROM (
SELECT Date, count(Value) AS Pct10
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p10
INNER JOIN (
SELECT Date, count(Value) AS Pct9
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY p1.Date
这是我的第二个主意。如果它可以工作,它将比我列出的第一个更快,更高效,但是对于较大的表仍然相当慢。
SELECT p10.Date, count(Value) AS Pct10, Pct90
FROM mydata p10
INNER JOIN (
SELECT Date, count(Value) AS Pct90
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY Date, Value
ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
LIMIT 1
编辑
好,集思广益。鉴于这是一个日期的百分位数的子查询(我什至不确定这是如何工作的):
SELECT Date, count(Value) AS Pct90
FROM mydata
WHERE Date = ?
GROUP BY Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata WHERE Date = ?)))
LIMIT 1
然后,让我们尝试修复ORDER BY:
SELECT Date, count(Value) as Pct90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
ORDER BY (ABS(0.9-(COUNT(Value)/d.DateTotal)))
LIMIT 1
如果您在我之前的示例中使用了这种模式,也许它会起作用。
编辑2
所以,我们再来一次,因为我们不能使用LIMIT 1(我应该已经意识到)。我实际上在自己的数据库上测试了以下内容(希望我将所有字段名和表名改回了应该的样子!),它似乎可以正常工作。您必须针对p10再次执行此操作,并将两者结合起来。
--- removed due to typos ---
编辑3
我在Edit 2中发现了一些错误,因此将其删除。这是整个百分比查询。据我所知,此查询对我的数据库有效(使用不同的字段和表)。
SELECT n.Date, n.Pct AS Pct10, n.Value AS Pct10Value, q.Pct AS Pct90, q.Value AS Pct90Value FROM (
SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) p
INNER JOIN (
SELECT Date, MIN(Abs10) AS Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) x GROUP BY Date
) AS m ON m.Selector = p.Abs10
GROUP BY p.Date) n
INNER JOIN (
SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) p
INNER JOIN (
SELECT Date, MIN(Abs90) AS Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) x GROUP BY Date
) AS m ON m.Selector = p.Abs90
GROUP BY p.Date) q ON q.Date = n.Date
关于mysql - MySQL中的百分位数按日期分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19501132/