链接维度的不同粒度级别的事实表

本文介绍了链接维度的不同粒度级别的事实表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

数据仓库设计的新手.我有一个表示地理区域(例如郊区，城市，州)的非规范化维度表.这是一个变化缓慢的维度.

New to data warehouse design. I have a denormalised dimension table representing geographies (e.g. suburb, city, state). This is a slowly changing dimension.

还具有多个事实表，每个事实表处于不同的粒度级别.

Also have multiple fact tables, each at different grain levels.

是否可以对此建模，以便事实表使用代理键，同时保持非规范化维度表?

Is it possible to model this so the fact tables use surrogate keys, whilst maintaining a denormalised dimension table?

推荐答案

如果您实际上拥有相同的尺寸数据，但具有不同的粒度，则可以通过创建聚合"来处理此问题.方面.在您的示例中，复制dim_geo表定义(而不是数据)，将dim命名为dim_geo_city之类的名称，然后将所有列以比城市更低的粒度(例如，郊区_id，郊区)删除.如果您在状态级别有事实，那么您将以相同的方式创建dim_geo_state，以此类推，以进行进一步的聚合.

If you have effectively the same dimensional data but at different grains then you handle this by creating "aggregate" dimensions. In your example, copy the dim_geo table definition (not the data), name the dim to something like dim_geo_city and drop all the columns at a lower granularity than city (e.g. suburb_id, suburb). If you have facts at the state level then you would create dim_geo_state in the same way - and so on for any further levels of aggregation.

事实人群将继续引用dim_geo，但事实住房应引用dim_geo_city.

Fact_population will continue to reference dim_geo but fact_housing should reference dim_geo_city.

最简单的填充聚合暗淡的方法是在基本暗淡(dim_geo)上运行SELECT DISTINCT，并且仅包括目标暗淡(dim_geo_city)中存在的列-然后，您获取结果数据并应用适当的SCD逻辑将其插入/更新到目标昏暗状态.

The easiest way to populate aggregate dims is to run a SELECT DISTINCT on the base dim (dim_geo) and only include the columns that exist in the target dim (dim_geo_city) - you then take the resulting data and apply the appropriate SCD logic to insert/update it into the target dim.

这篇关于链接维度的不同粒度级别的事实表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！