本文介绍了我可以在数据库多对多字段中使用计数器来减少查找吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出访问存储在连接对象中的数据的最快方法.下面的示例与我的问题类似,但上下文不同,因为我处理的实际数据集的关系有些不直观.

I am trying to figure out the fastest way to access data stored in a junction object. The example below is analagous to my problem, but with a different context, because the actual dataset I am dealing with is somewhat unintuitive in its relationships.

我们有 3 个类:UserProductRating.用户与 Product 具有多对多关系,Rating 作为连接/通过"类.

We have 3 classes: User, Product, and Rating. User has a many-to-many relationship to Product with Rating as the junction/'through' class.

Rating 对象存储几个问题的答案,这些问题是 1-5 范围内的整数评分(示例问题:Product 的质量如何,如何是 Product 的价值,Product 对用户有多友好).为简化起见,假设每个 User 对他们购买的每个 Product 进行评分.

The Rating object stores the answers to several questions which are integer ratings on a scale of 1-5 (Example questions: How is the quality of the Product, how is the value of the Product, how user-friendly is the Product). For simplification assume every User rates every Product they buy.

现在这是我要执行的计算:对于 User,计算他们购买的所有 Product 的平均评分(即平均评分来自所有其他 User,其中一个将来自这个 User 自己).然后我们可以告诉用户平均而言,您购买的产品被所有购买该产品的客户评为价值 3/5".

Now here is the calculation I want to perform: For a User, calculate the average rating of all the Products they have bought (that is, the average rating from all other Users, one of which will be from this User themself). Then we can tell the user "On average, you buy products rated 3/5 for value by all customers who bought that product".

简单而缓慢的方法就是迭代用户的所有评论对象.如果我们假设每个用户都购买了少量(

The simple and slow way is just to iterate over all of a user's review objects. If we assume that each user has bought a small (<100) number of products, and each product has n ratings, this is O(100n) = O(n).

但是,我也可以执行以下操作:在 Product 类上,保留选择每个数字的 Rating 数量的计数器(例如,有多少 用户 s 对该产品的价值评价为 3/5).如果每次对 Product 进行评分时都增加该计数器,那么计算给定 Product 的平均值只需要检查每个 Rating 的 5 个计数器标准.

However, I could also do the following: On the Product class, keep a counter of the number of Rating s that selected each number (e.g. how many User s rated this product 3/5 for value). If you increment that counter every time a Product is rated, then computing the average for a given Product just requires checking the 5 counters for each Rating criteria.

这是一种有效的技术吗?它是否普遍使用/是否有名称?对我来说似乎很直观,但我对数据库的了解不够,无法判断是否存在一些根本性缺陷.

Is this a valid technique? Is it commonly employed/is there a name for it? It seems intuitive to me, but I don't know enough about databases to tell whether there's some fundamental flaw or not.

推荐答案

这是正常的.它最终是缓存:对状态进行冗余编码,以牺牲其他使用模式为代价使某些使用模式受益.当然,这也是一种复杂化.

This is normal. It is ultimately caching: encoding of state redundantly to benefit some patterns of usage at the expense of others. Of course it's also a complexification.

仅仅因为 RDBMS 数据结构是关系并不意味着您不能从某种简单的形式重新安排编码状态的方式.例如非规范化.

Just because the RDBMS data structure is relations doesn't mean you can't rearrange how you are encoding state from some straightforward form. Eg denormalization.

(有时冗余设计(包括像您这样的设计)在它们实际上不是非规范化的结果并且冗余不是非规范化导致或规范化消除的那种时被称为非规范化".SQL 数据库中的跨表依赖/约束 事实上,可以合理地将您的情况描述为涉及规范化而不保留 FD(函数依赖).从一个带有用户 id 和其他列,它们的 ratings(一个关系)和它的 counter.然后 ratings 在功能上决定 counter 因为 counter = select count(*) from ratings. 分解为 user etc + counter, 即表 User, 和 user + ratings, 取消分组到表 Rating.)

(Sometimes redundant designs (including ones like yours) are called "denormalized" when they are not actually the result of denormalization and the redundancy is not the kind that denormalization causes or normalization removes. Cross Table Dependency/Constraint in SQL Database Indeed one could reasonably describe your case as involving normalization without preserving FDs (functional dependencies). Start with a table with a user's id & other columns, their ratings (a relation) & its counter. Then ratings functionally determines counter since counter = select count(*) from ratings. Decompose to user etc + counter, ie table User, and user + ratings, which ungroups to table Rating. )

你对谷歌搜索时使用的最佳术语有什么建议

我的一个经常评论:谷歌很多清晰、简洁和您的问题/问题/目标/需要的特定措辞,以及各种术语子集你可能会发现它们的标签 &没有您的特定名称(变量/数据库/表/列/约束/等).例如我什么时候可以在数据库中冗余存储(总和或总计)".人工措辞,而不仅仅是关键词,似乎有帮助.您最好的选择可能是优化 SQL 数据库设计以提高性能.有整本书('amazon isbn'),一些在线('pdf').(但可能主要是重新查询).研究与仓储相关的技术,因为 OLTP 数据库充当 OLAP 数据库的输入缓冲区,并使用 SQL 处理大数据.(例如快照调度.)

A frequent comment by me: Google many clear, concise & specific phrasings of your question/problem/goal/desiderata with various subsets of terms & tags as you may discover them with & without your specific names (of variables/databases/tables/columns/constraints/etc). Eg 'when can i store a (sum OR total) redundantly in a database'. Human phrasing, not just keywords, seems to help. Your best bet may be along the lines of optimizing SQL database designs for performance. There are entire books ('amazon isbn'), some online ('pdf'). (But maybe mostly re queries). Investigate techniques relevant to warehousing, since an OLTP database acts as an input buffer to an OLAP database, and using SQL with big data. (Eg snapshot scheduling.)

PS 我称之为缓存"(标签 caching 也是如此) 是(我的典型)相当抽象,以至于有严肃的笑话说 CS 中的一切都在缓存.(谷歌搜索......计算机科学中只有两个难题:缓存失效和命名事物."--Phil Karlton.)(欢迎两者.)

PS My calling this "caching" (so does tag caching) is (typical of me) rather abstract, to the point where there are serious-jokes that everything in CS is caching. (Googling... "There are only two hard problems in Computer Science: cache invalidation and naming things."--Phil Karlton.) (Welcome to both.)

这篇关于我可以在数据库多对多字段中使用计数器来减少查找吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 22:26
查看更多