问题描述
我正在从事一个数据仓库项目,该项目将涉及集成来自多个源系统的数据。我已经建立了一个SSIS程序包,该程序包可以填充客户维度,并使用缓慢变化的维度工具来跟踪对客户的更新。
I am working on a data warehouse project that will involve integrating data from multiple source systems. I have set up an SSIS package that populates the customer dimension and uses the slowly changing dimension tool to keep track of updates to the customer.
我遇到了一些问题。举个例子:
I'm running into some issues. Take this example:
源系统A可能有类似这样的记录:
Source system A might have a record like that looks like this:
名字,姓氏,邮编
简,美国能源部(Doe),14222
First Name, Last Name, ZipcodeJane, Doe, 14222
源系统B可能为同一客户端记录如下:
Source system B might have a record for the same client that looks like this:
名字,姓氏,邮政编码
简,母鹿,未知
First Name, Last Name, ZipcodeJane, Doe, Unknown
如果我是第一次从系统A导入记录,我将有名字,姓氏和种族。大。现在,如果我从系统B导入客户记录,则可以进行模糊匹配以识别这是同一个人,并使用尺寸变化缓慢的工具来更新信息。但是在这种情况下,我将丢失邮政编码,因为未知会覆盖有效数据。
If I first import the record from system A, I'll have the first name, last name, and ethnicity. Great. Now, if I import the client record from system B, I can do fuzzy matching to recognize that this is the same person and use the slowly changing dimension tool to update the information. But in this case, I'm going to lose the zipcode because the 'unknown' will overwrite the valid data.
我想知道我是否以错误的方式解决了这个问题。 SCD工具似乎没有提供任何根据新数据是否有效来选择性地更新属性的方法。合并语句会更好吗?我是否犯了某种我没有看到的基本设计错误?
I am wondering if I am approaching this problem in the wrong way. The SCD tool doesn't seem to offer any way of selectively updating attributes based on whether the new data is valid or not. Would a merge statement work better? Am I making some kind of fundamental design mistake that I'm not seeing?
感谢您的任何建议!
推荐答案
以我的经验,内置SCD工具不够灵活,无法处理此要求。
In my experience the built-in SCD tool is not flexible enough to handle this requirement.
是两个 MERGE
语句,或一系列 UPDATE
和 INSERT
语句可能会为您提供逻辑和性能方面的最大灵活性。
Either a couple of MERGE
statements, or a series of UPDATE
and INSERT
statements will probably give you most flexibility with logic, and performance.
可能有模型SCD类型2的 MERGE
语句在那里,但是这是我使用的模式:
There are probably models out there for MERGE
statement for SCD Type 2 but here is the pattern I use:
Merge Target
Using Source
On Target.Key = Source.Key
When Matched And
Target.NonKeyAttribute <> Source.NonKeyAttribute
Or IsNull(Target.NonKeyNullableAttribute, '') <> IsNull(Source.NonKeyNullableAttribute, '')
Then Update Set SCDEndDate = GetDate(), IsCurrent = 0
When Not Matched By Target Then
Insert (Key, ... , SCDStartDate, IsCurrent)
Values (Source.Key, ..., GetDate(), 1)
When Not Matched By Source Then
Update Set SCDEndDate = GetDate(), IsCurrent = 0;
Merge Target
Using Source
On Target.Key = Source.Key
-- These will be the changing rows that were expired in first statement.
When Not Matched By Target Then
Insert (Key, ... , SCDStartDate, IsCurrent)
Values (Source.Key, ... , GetDate(), 1);
这篇关于需要帮助来了解SSIS中SCD的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!