问题描述
我正在通过无限滚动加载产品,每次 12 个.
I"m loading products via an infinite scroll in chunks of 12 at a time.
有时,我可能想根据他们拥有的粉丝数量对这些内容进行排序.
At times, I may want to sort these by how many followers they have.
下面是我如何跟踪每个产品有多少粉丝.
Below is how i'm tracking how many followers each product has.
关注在一个单独的集合中,因为16mb的数据上限,关注数量应该是无限的.
Follows are in a separate collection, because of the 16mb data cap, and the amount of follows should be unlimited.
遵循架构:
var FollowSchema = new mongoose.Schema({
user: {
type: mongoose.Schema.ObjectId,
ref: 'User'
},
product: {
type: mongoose.Schema.ObjectId,
ref: 'Product'
},
timestamp: {
type: Date,
default: Date.now
}
});
遵循架构的产品:
var ProductSchema = new mongoose.Schema({
name: {
type: String,
unique: true,
required: true
},
followers: {
type: Number,
default: 0
}
});
每当用户关注/取消关注产品时,我都会运行此功能:
Whenever a user follows / unfollows a product, I run this function:
ProductSchema.statics.updateFollowers = function (productId, val) {
return Product
.findOneAndUpdateAsync({
_id: productId
}, {
$inc: {
'followers': val
}
}, {
upsert: true,
'new': true
})
.then(function (updatedProduct) {
return updatedProduct;
})
.catch(function (err) {
console.log('Product follower update err : ', err);
})
};
我的问题:
1:产品中增加的follower"值是否有可能遇到某种错误,导致数据不匹配/不一致?
1: Is there a chance that the incremented "follower" value within product could hit some sort of error, resulting in un matching / inconsistent data?
2:编写一个聚合来计算每个产品的粉丝数会更好,还是会太贵/太慢?
2: would it be better to write an aggregate to count followers for each Product, or would that be too expensive / slow?
最终,我可能会在 graphDB 中重写它,因为它看起来更合适,但就目前而言——这是掌握 MongoDB 的练习.
Eventually, I'll probably rewrite this in a graphDB, as it seems better suited, but for now -- this is an exercise in mastering MongoDB.
推荐答案
1 如果插入后递增或删除后递减,则有可能导致数据不一致.例如,插入成功但递增失败.
1 If you increment after inserting or decrement after removing, these is a chance resulting in inconsistent data. For example, insertion succeed but incrementing failed.
2 直观地说,在这种情况下,聚合比 find 昂贵得多.我做了一个基准测试来证明这一点.
2 Intuitively, aggregation is much more expensive than find in this case. I did a benchmark to prove it.
首先随机生成 1000 个用户、1000 个产品和 10000 个关注者.然后,使用此代码进行基准测试.
First generate 1000 users, 1000 products and 10000 followers randomly. Then, use this code to benchmark.
import timeit
from pymongo import MongoClient
db = MongoClient('mongodb://127.0.0.1/test', tz_aware=True).get_default_database()
def foo():
result = list(db.products.find().sort('followers', -1).limit(12).skip(12))
def bar():
result = list(db.follows.aggregate([
{'$group': {'_id': '$product', 'followers': {'$sum': 1}}},
{'$sort': {'followers': -1}},
{'$skip': 12},
{'$limit': 12}
]))
if __name__ == '__main__':
t = timeit.timeit('foo()', 'from __main__ import foo', number=100)
print('time: %f' % t)
t = timeit.timeit('bar()', 'from __main__ import bar', number=100)
print('time: %f' % t)
输出:
time: 1.230138
time: 3.620147
创建索引可以加速查找查询.
Creating index can speed up find query.
db.products.createIndex({followers: 1})
time: 0.174761
time: 3.604628
如果您需要来自产品的属性(例如名称),则需要另一个 O(n) 查询.
And If you need attributes from product such as name, you need another O(n) query.
我猜当数据向上扩展时,聚合会慢得多.如果需要,我可以对大规模数据进行基准测试.
I guess that when data scale up, aggregation will be much more slow. If need, I can benchmark on big scale data.
这篇关于$inc 关注者计数,还是应该使用聚合来跟踪它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!