


I denormalising a OLTP database for use in a DWH.At the moment I am denormalising studygroups.

  • 每个研究组都有一个指向一个项目的钥匙.
  • 每个项目都有一个指向1个部门的密钥.
  • 每个系都有一个指向一所大学的钥匙.
  • 每所大学都有指向一个城市的钥匙.

现在我知道您应该对OLTP的sh * t进行规范化处理,但是在这个dwh部门中,它将是一个独立的维度.这也适用于大学.从研究组添加指向部门的密钥就足够了吗?还是尽可能地进行规范化并将部门中的所有属性以及与M:1相关的表中的所有属性添加到维度研究组中是否更明智?即使部门和大学将自己确定规模?

Now I know that you are supposed to denormalize the sh*t out your OLTP but in this dwh department will be a dimension on its own. This goes for university also. Would it suffise to add a key from studygroup pointing at department or is it wiser to denormalize as far as you can and add all attributes from the department and all attributes from its M:1 related tables to the dimension studygroup? Even when department and university will be dimensions by themselves?


In other words: how far/deep do you go when denormalizing?



The key concept behind a dimensional model is:

  • 将事实表保留为3NF(第三范式);
  • 将尺寸反规范化为2NF(第二范式)


So ideally, the only joins you should have in your model are the joins between fact tables and relevant dimensions.


As part of this philosophy:

  • 避免使用雪花"设计,因为其中的尺寸包含其他尺寸的关键.总是有可能想出一个数据模型,该模型具有与雪花相同的功能,而不会违反3NF/2NF规则;
  • 在两个单独的维度(即部门和研究小组)之间直接没有任何直接连接.维度之间的所有关系都必须通过事实表来解决;
  • 在两个单独的事实表之间绝对没有任何直接联接.事实表之间的任何关系都必须通过共享维来解决.


Finally, consider that dimensional design, besides optimization of the data for querying, serves a second important purpose: it's a semantic model of the business (or whatever else it represents). So, when making decisions about combining data elements into dimensions and facts, consider their "logical affinity" - they should make intuitive sense to the end users. If you have hard times explaining to a BI analyst the meaning of your dimension or fact table, most likely you've made a modeling mistake.


For example, in your case you should consider logical relations between universities, departments, study groups, etc. It's very likely that University/Department form a natural hierarchy. If so, they should belong to the same dimension. Study group, on the other hand, might not - let's assume, it's possible to form study groups across multiple universities and/or multiple departments. Such Many:Many relations are clear indication that they should be resolved via fact tables. In addition, relations between universities and departments are stable (rarely change), while study groups are formed and dissolved very often, and thus should be modeled separately.


In general, if you see 1:1 or 1:M relations between dimensional elements, it's often an indication that they should be de-normalized into the same table (again, only if their combination makes logical sense). If the relations are M:M, most likely they belong to different tables (you can force them into the same table, but often such tables look like Frankenstein creatures).


You can get much better help by making your question more specific - draw your dimensional model, post it, and ask for specific issues/challenges you have. For general concepts, books from Kimball and Inmon are your best friends.


09-12 11:01