问题描述
在CouchDB中是否有一种很好的方法来模仿SELECT COUNT(DISTINCT字段)的行为?
Is there a good way to mimic the behavior of SELECT COUNT(DISTINCT field) in CouchDB?
想象一下我们有以下文档,它记录了时间用户播放了某首歌曲:
Imagine we have the following document, which records the time at which a user played a certain song:
{
song_id: "happy birthday",
user_id: "boris",
date_played: [2011, 11, 14, 00, 12, 55],
_id: ...
}
我想知道用户 boris曾经播放过的不同歌曲的数量。如果我们的用户已经听了20次生日快乐,那首歌仍然只占总歌曲数的+1。
I'd like to know the number of distinct songs ever played by our user "boris". If our user has listened to "happy birthday" 20 times, that song should still contribute just +1 to the overall song count.
在MySQL中,我只需执行 SELECT COUNT(DISTINCT song_id)FROM播放WHERE user_id = boris
,但是在CouchDB中编写此代码时我空白。
In MySQL, I'd simply execute SELECT COUNT(DISTINCT song_id) FROM plays WHERE user_id = "boris"
, but I'm drawing a blank when it comes to writing this in CouchDB.
工作环境1:如果我更改了架构,而是将所有歌曲播放存储在单个用户文档中的 boris,那么我可以写一个映射到仅发出不同的值。但是,如果我想在last.fm的规模上构建某些东西,我担心的是,随着 boris文档大小(播放次数)的持续增长,更新将花费很长时间。 (可能还会有一个最终要达到的最大文档大小。)
Work-Around 1: If I changed my schema and instead stored all the song-plays inside a single user document for "boris" I could then write a map to emit only distinct values. However, if I wanted to build something on the scale of last.fm, my fear is that updates would start taking a very long time as the "boris" document size (number of plays) continued to grow. (There might also be a maximum document size that I would eventually hit).
工作环境2:我也可以编写一个map函数返回 all 的不同记录,我的python脚本可以总结这些记录;但是再加上成千上万首不同的歌曲,这也会变得非常慢。
Work-Around 2: I could also write a map function to return all of the distinct records, which my python script could sum up itself; but again with hundreds of thousands of distinct songs this would become very slow as well.
我还缺少其他选择吗?
推荐答案
此答案由Zachary Zolton在沙发上的邮件列表中提供:
This answer was provided by Zachary Zolton on the couchdb mailing list:
开始您已经有了一个视图,可以为您提供鲍里斯(Boris)的5万首独特的
歌曲,您可以使用_list函数返回行数。
Since you've already got a view that'll give you Boris's 50k uniquesongs, you could use a _list function to return the number of rows.
像这样应该可以解决问题:
Something like this should do the trick:
function() {
var count = 0;
while(getRow()) count++;
return JSON.stringify({count: count});
}
如果使用相同的视图,键范围和$查询此列表函数b $ b组级别,它将仅使用一些JSON进行响应,例如: { count: 50612}
If you query this list function, with the same view, key range andgroup level, it'll just respond with a bit of JSON, such as: {"count":"50612"}
您可以在此处阅读更多内容:
You can read up more here:
- http://guide.couchdb.org/draft/transforming.html
- http://wiki.apache.org/couchdb/Formatting_with_Show_and_List
这篇关于如何在CouchDB中编写SELECT COUNT(DISTINCT字段)查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!