问题描述
数据仓库设计的新手.我有一个表示地理区域(例如郊区,城市,州)的非规范化维度表.这是一个变化缓慢的维度.
New to data warehouse design. I have a denormalised dimension table representing geographies (e.g. suburb, city, state). This is a slowly changing dimension.
还具有多个事实表,每个事实表处于不同的粒度级别.
Also have multiple fact tables, each at different grain levels.
是否可以对此建模,以便事实表使用代理键,同时保持非规范化维度表?
Is it possible to model this so the fact tables use surrogate keys, whilst maintaining a denormalised dimension table?
推荐答案
如果您实际上拥有相同的尺寸数据,但具有不同的粒度,则可以通过创建聚合"来处理此问题.方面.在您的示例中,复制dim_geo表定义(而不是数据),将dim命名为dim_geo_city之类的名称,然后将所有列以比城市更低的粒度(例如,郊区_id,郊区)删除.如果您在状态级别有事实,那么您将以相同的方式创建dim_geo_state,以此类推,以进行进一步的聚合.
If you have effectively the same dimensional data but at different grains then you handle this by creating "aggregate" dimensions. In your example, copy the dim_geo table definition (not the data), name the dim to something like dim_geo_city and drop all the columns at a lower granularity than city (e.g. suburb_id, suburb). If you have facts at the state level then you would create dim_geo_state in the same way - and so on for any further levels of aggregation.
事实人群将继续引用dim_geo,但事实住房应引用dim_geo_city.
Fact_population will continue to reference dim_geo but fact_housing should reference dim_geo_city.
最简单的填充聚合暗淡的方法是在基本暗淡(dim_geo)上运行SELECT DISTINCT,并且仅包括目标暗淡(dim_geo_city)中存在的列-然后,您获取结果数据并应用适当的SCD逻辑将其插入/更新到目标昏暗状态.
The easiest way to populate aggregate dims is to run a SELECT DISTINCT on the base dim (dim_geo) and only include the columns that exist in the target dim (dim_geo_city) - you then take the resulting data and apply the appropriate SCD logic to insert/update it into the target dim.
这篇关于链接维度的不同粒度级别的事实表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!