问题描述
我正在一个小组项目上,我们正在讨论是否要从现有数据库中计算我们想要的数据,并将其存储在新数据库中以供以后查询,还是每次我们从现有数据库中计算数据时需要使用它。我想知道哪种实现的利弊。您有什么建议吗?
I am working on a group project and we are having a discussion about whether to calculate data that we want from an existing database and store it in a new database to query later, or calculate the data from the existing database every time we need to use it. I was wondering what the pros and cons may be for either implementation. Is there any advice you could give?
编辑:这是更详尽的解释。我们有一个大型数据库,每天都有大量信息提交给它。我们正在构建一个系统来跟踪某些数据点。例如,我们获取的是用户执行数据库中输入内容的次数。使用此示例(实际想法有点复杂),我们正在讨论获取每个用户的动作计数的方法。第一种方法是创建一个存储用户及其操作计数的数据库,并在每次需要操作计数时查询该数据库。第二种方法是查询大型数据库,并在每次需要使用该数据库时统计每个用户的操作。我希望这个解释有助于解释。有想法吗?
Here is more elaborate explanation. We have a large database that has a lot of information being submitted to it daily. We are building a system to track certain points of data. For example, we are getting the count of how many times a user does something that is entered in the database. Using this example (are actual idea is a bit more complex), we are discussing to methods of getting the count of actions per users. The first method is to create a database that stores the users and their action count, and query this database every time we need the action count. The second method would be to query the large database and count the actions per user every time we need to use it. I hope this explanation helps explain. Thoughts?
编辑2:还有两点可能需要指出的事情是:1:我只能读取大型数据库; 2:我的最终目标是在最终用户的网页上显示此信息。
Edit 2: Two more things that may be useful to point out is 1: I only have read access to the large database and 2: My ultimate goal is to display this information on a web page for end users.
推荐答案
这是有关通过缓存进行优化的通用问题。以下是我对基本相同问题的回答。即使该问题提供了许多不同的细节,但它们都没有一个足够具体到值得非通用的答案:
This is a generic question about optimization by caching. The following was my answer to essentially the same question. Even though that question provided a bunch of different details, none of them were specific enough to merit a non-generic answer either:
直到您可以证明它们是在足够多的情况下,视图和计算的
列是最简单的。
Until you can show that they are in adequate, views and calculated columns are the simplest.
DBMS的整个想法是将
应用程序状态的表示形式存储为数据库(规范化减少了
的冗余度),然后查询并让DBMS实现和
优化答案的计算。您没有提出
不以最直接的方式这样做的原因。
The whole idea of a DBMS is to store a representation of your application state as the database (which normalization reduces the redundancy of) and then you query and let the DBMS implement and optimize calculation of the answer. You haven't presented a reason for not doing that in the most straightforward way possible.
[]
始终确保应用程序正在读取自己的个人信息(外部数据库,它是 the(概念性)数据库的视图,以便当您更改前者(加上其余一些合并的交互作用)的实现时,后者(加上其他一些合并的机制的其余部分)的应用程序不必更改(逻辑独立性)。这些应用程序就是您的用户和您的跟踪器。
Always make sure an application is reading its own personal ("external") database that is a view of "the" ("conceptual") database so that when you change the implemention of the former (plus the rest of some combined interfact) by the latter (plus the rest of some compbined mechanisms) your applications do not have to change ("logical independence"). Here the applications are your users' and your trackers'.
最终,您必须进行检测和验证。值得时,您就开始缓存。就诸如视图和快照之类的高级概念而言,最好尽可能少,而在非DBMS代码中则尽可能少。关系模型的好处之一是,可以很容易地用另一个直接的关系界面来描述一个直截了当的关系界面。通过提供一个或当前接口家族中的哪个接口的接口,您可以保护应用程序免于更改。
Ultimately you must instrument and guestimate. When it is worth it you start caching. Preferably as much as possible in terms of high-level notions like views and snapshots and as little as possible in non-DBMS code. One of he benefits of the relational model is that it is easy to describe a strightforward relational interface in terms of another straightforward relational interface. You protect your applications from change by offering an interface that hides secrets of implementation or which of a family of interfaces is the current one.
这篇关于该信息应该实时计算还是存储在单独的数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!