这里,我用MySQL 以及Python 分别实现了rank 窗口函数。
原始表信息:
点击(此处)折叠或打开
- t_girl=# \d group_concat;
- Table "ytt.group_concat"
- Column | Type | Modifiers
- ----------+-----------------------+-----------
- rank | integer |
- username | character varying(20) |
表数据
点击(此处)折叠或打开
- t_girl=# select * from group_concat;
- rank | username
- ------+----------
- 100 | Lucy
- 127 | Lucy
- 146 | Lucy
- 137 | Lucy
- 104 | Lucy
- 121 | Lucy
- 136 | Lily
- 100 | Lily
- 100 | Lily
- 105 | Lily
- 136 | Lily
- 149 | ytt
- 116 | ytt
- 116 | ytt
- 149 | ytt
- 106 | ytt
- 117 | ytt
- (17 rows)
- Time: 0.638 ms
PostgreSQL 的rank 窗口函数示例:
点击(此处)折叠或打开
- t_girl=# select username,rank,rank() over(partition by username order by rank desc) as rank_cnt from group_concat;
- username | rank | rank_cnt
- ----------+------+----------
- Lily | 136 | 1
- Lily | 136 | 1
- Lily | 105 | 3
- Lily | 100 | 4
- Lily | 100 | 4
- Lucy | 146 | 1
- Lucy | 137 | 2
- Lucy | 127 | 3
- Lucy | 121 | 4
- Lucy | 104 | 5
- Lucy | 100 | 6
- ytt | 149 | 1
- ytt | 149 | 1
- ytt | 117 | 3
- ytt | 116 | 4
- ytt | 116 | 4
- ytt | 106 | 6
- (17 rows)
- Time: 131.150 ms
MySQL 提供了group_concat 聚合函数可以变相的实现:
点击(此处)折叠或打开
- mysql>
- select a.username, a.rank, find_in_set(a.rank,b.rank_gp) as rank_cnt
- from group_concat as a ,
- (select username,group_concat(rank order by rank desc separator ',') as rank_gp from group_concat group by username
- ) b
- where a.username = b.username order by a.username asc,a.rank desc;
- +----------+------+----------+
- | username | rank | rank_cnt |
- +----------+------+----------+
- | Lily | 136 | 1 |
- | Lily | 136 | 1 |
- | Lily | 105 | 3 |
- | Lily | 100 | 4 |
- | Lily | 100 | 4 |
- | Lucy | 146 | 1 |
- | Lucy | 137 | 2 |
- | Lucy | 127 | 3 |
- | Lucy | 121 | 4 |
- | Lucy | 104 | 5 |
- | Lucy | 100 | 6 |
- | ytt | 149 | 1 |
- | ytt | 149 | 1 |
- | ytt | 117 | 3 |
- | ytt | 116 | 4 |
- | ytt | 116 | 4 |
- | ytt | 106 | 6 |
- +----------+------+----------+
- 17 rows in set (0.02 sec)
当然了,如果MySQL SQL不太熟悉,可以用程序来处理,比如我下面用python 实现了rank 函数,执行结果如下:(脚本源代码最后)
点击(此处)折叠或打开
- >>> ================================ RESTART ================================
- >>>
- username | rank | rank_cnt
- --------------------------------
- ytt |149 |1
- ytt |149 |1
- ytt |117 |3
- ytt |116 |4
- ytt |116 |4
- ytt |106 |6
- Lucy |146 |1
- Lucy |137 |2
- Lucy |127 |3
- Lucy |121 |4
- Lucy |104 |5
- Lucy |100 |6
- Lily |136 |1
- Lily |136 |2
- Lily |105 |3
- Lily |100 |4
- Lily |100 |4
- (17 Rows.)
- Time: 0.162 Seconds.
附上脚本代码:
点击(此处)折叠或打开
- from __future__ import print_function
- from datetime import date, datetime, timedelta
- import mysql.connector
- import time
- # Created by ytt 2014/5/14.
- # Rank function implement.
- def db_connect(is_true):
- cnx = mysql.connector.connect(host='192.168.1.131',port='3306',user='python_user', password='python_user',database='t_girl',autocommit=is_true)
- return cnx
- def db_rs_rank(c1 ='username desc' ,c2 = ' rank desc'):
- # c1: partition column.
- # c2: sort column.
- time_start = time.time()
- cnx = db_connect(True)
- rs = cnx.cursor()
- query0 = "select username,rank from group_concat order by " + c1 + ", " + c2
- rs.execute(query0,multi=False)
- if rs.with_rows:
- rows = rs.fetchall()
- else:
- return "No rows affected."
- i = 0
- j = 0
- k = 1
- result = []
- field1_compare = rows[0][0]
- field2_compare = rows[0][1]
- while i < len(rows):
- if field1_compare == rows[i][0]:
- j += 1
- if field2_compare != rows[i][1]:
- field2_compare =rows[i][1]
- k = j
- result.append((rows[i][0],rows[i][1],k))
- else:
- j = 1
- k = 1
- field1_compare = rows[i][0]
- result.append((rows[i][0],rows[i][1],k))
- i += 1
- i = 0
- rows_header = list(rs.column_names)
- rows_header.append('rank_cnt')
- print (rows_header[0].center(10,' ') + '|' + rows_header[1].center(10,' ') + '|' + rows_header[2].center(10,' '))
- print ('-'.center(32,'-'))
- while i < len(result):
- print (result[i][0].ljust(10,' ') + '|' + str(result[i][1]).ljust(10,' ') + '|' + str(result[i][2]).ljust(10,' '))
- i += 1
- rs.close()
- cnx.close()
- time_end = time.time()
- print ('(' + str(len(rows))+ ' Rows.)')
- print ("Time:" + str(round((time_end-time_start),3)).rjust(10,' ') + ' Seconds.')
- if __name__=='__main__':
- db_rs_rank()