


I have authenticated users in my application who have access to a shared database of up to 500,000 items. Each of the users has their own public facing web site and needs the ability to prioritize the items on display (think upvote) on their own site.


out of the 500,000 items they may only have up to 200 prioritized items, the order of the rest of the items is of less importance.


Each of the users will prioritize the items differently.


I initially asked a similar mysql question here Mysql results sorted by list which is unique for each user and got a good answer but i believe a better option may be to opt for a non sql indexed solution.


Can this be done in Lucene?, is there another search technology which would be better for this.

ps。 Google会根据您的搜索结果实施类似的类型设置,如果您已登录,则可以优先排序并排除自己的搜索结果。

ps. Google implements a similar type setup with their search results where you can prioritize and exclude your own search results if you are logged in.

更新:使用sphinx重新标记为i我一直在阅读文档,我相信它可以通过存储在内存中的每文档属性值来做我想要的事情 - 感兴趣的是从sphinx gurus听到任何反馈

Update: re-tagged with sphinx as i have been reading the documentation and i believe it may be able to do what i am looking for with "per-document attribute values" stored in memory - interested to hear any feedback on this from sphinx gurus



You'll definitely want to store the id of item in each document object when building your index. There's a few ways to do the next step, but an easy one would be take the prioritized items and add them to your search query, something like this for each special item:

"OR item_id=%d+X"


where X is the amount of boost you'd like to use. You'll probably need to empirically tweak this number to make sure that just being "upvoted" doesn't put it to the top of a list searching for something totally unrelated.

这样做至少会阻止你进行许多烦人的后处理步骤,这些步骤需要你遍历整个结果集 - 希望在查询索引时可以正确排序。

Doing it this way will at least prevent you from a lot of annoying postprocessing steps that would require you to iterate over the whole result set -- hopefully the proper sorting will be there right from querying the index.


07-29 11:24