I've been looking at ranking algorithms recently, specifically those used by Reddit and Hacker News. The algorithms themselves are simple enough, but I don't quite understand how they are used.One thing that I could do is implement the algorithm straight in SQL, so that every time a user goes to a page displaying ranked posts, something like this would run:SELECT thing1, thing2 FROM tableORDER BY ranking_algorithm DESCLIMIT page*20, 20There are several similar questions on SO, but the only answer given is to put the ranking algorithm inside the SQL query. Then the thread dies...Putting the algorithm in the SQL query fine on a smaller scale, but what if the website has a large number of users and a very large number of posts? That means that the every time any user opens a page that displays ranked posts, that query will be run. That can't be very efficient.Now, Reddit and Hacker News don't run their ranking algorithms in as SQL queries, but in python and ark respectively. So how and when exactly are they used?One possible solution is to take all the relevant info from every post and store it in some data structure on the web server. Then rank and sort this data structure.Every time someone opens a page that shows the ranked posts, you just go to the data structure, retrieve the correct range of posts, and display them.Then every half hour or so, you retrieve the most up to date information from the server, rank it, sort it, and update the data structure.Other less expensive queries, such as retrieving and displaying all the info from a specific post, or displaying the newest posts (as opposed to the best scored) could be done in SQL every time the relevant page is opened.The advantage is that your database is being hit (for the expensive ranking query) only once every half hour. The disadvantage is that you need to have a duplicate of a large chunk of your database. 解决方案 Reddit uses Pyrex, the sort algorithm is a Python C extension to improve performance.So, you can do the same in SQL when the record is updated, pex: when is up or down voted.The pseudocode you must to translate to your SQL engine syntax:function hot(ups, downs, date){ score = ups - downs; order = log(max(abs(score), 1), 10); if (score>0){ sign = 1; } else { if (score<0){ sign = -1; } else { sign = 0; } } td = date - datetime(1970,1,1); seconds = td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000) - 1134028003; return round(order + sign * seconds / 45000, 7);}So you must to store in the post table the ups, downs, date and the hot function result. And then you can make a sort in the hot column.You can see the Reddit source code here: http://code.reddit.com/ 这篇关于Reddit和Hacker News排名算法如何使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-23 04:25