本文介绍了需要帮助来了解SSIS中SCD的替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事一个数据仓库项目,该项目将涉及集成来自多个源系统的数据。我已经建立了一个SSIS程序包,该程序包可以填充客户维度,并使用缓慢变化的维度工具来跟踪对客户的更新。

I am working on a data warehouse project that will involve integrating data from multiple source systems. I have set up an SSIS package that populates the customer dimension and uses the slowly changing dimension tool to keep track of updates to the customer.

我遇到了一些问题。举个例子:

I'm running into some issues. Take this example:

源系统A可能有类似这样的记录:

Source system A might have a record like that looks like this:

名字,姓氏,邮编
简,美国能源部(Doe),14222

First Name, Last Name, ZipcodeJane, Doe, 14222

源系统B可能为同一客户端记录如下:

Source system B might have a record for the same client that looks like this:

名字,姓氏,邮政编码
简,母鹿,未知

First Name, Last Name, ZipcodeJane, Doe, Unknown

如果我是第一次从系统A导入记录,我将有名字,姓氏和种族。大。现在,如果我从系统B导入客户记录,则可以进行模糊匹配以识别这是同一个人,并使用尺寸变化缓慢的工具来更新信息。但是在这种情况下,我将丢失邮政编码,因为未知会覆盖有效数据。

If I first import the record from system A, I'll have the first name, last name, and ethnicity. Great. Now, if I import the client record from system B, I can do fuzzy matching to recognize that this is the same person and use the slowly changing dimension tool to update the information. But in this case, I'm going to lose the zipcode because the 'unknown' will overwrite the valid data.

我想知道我是否以错误的方式解决了这个问题。 SCD工具似乎没有提供任何根据新数据是否有效来选择性地更新属性的方法。合并语句会更好吗?我是否犯了某种我没有看到的基本设计错误?

I am wondering if I am approaching this problem in the wrong way. The SCD tool doesn't seem to offer any way of selectively updating attributes based on whether the new data is valid or not. Would a merge statement work better? Am I making some kind of fundamental design mistake that I'm not seeing?

感谢您的任何建议!

推荐答案

以我的经验,内置SCD工具不够灵活,无法处理此要求。

In my experience the built-in SCD tool is not flexible enough to handle this requirement.

是两个 MERGE 语句,或一系列 UPDATE INSERT 语句可能会为您提供逻辑和性能方面的最大灵活性。

Either a couple of MERGE statements, or a series of UPDATE and INSERT statements will probably give you most flexibility with logic, and performance.

可能有模型SCD类型2的 MERGE 语句在那里,但是这是我使用的模式:

There are probably models out there for MERGE statement for SCD Type 2 but here is the pattern I use:

Merge Target
  Using Source
    On Target.Key = Source.Key

  When Matched And
    Target.NonKeyAttribute <> Source.NonKeyAttribute
    Or IsNull(Target.NonKeyNullableAttribute, '') <> IsNull(Source.NonKeyNullableAttribute, '')
  Then Update Set SCDEndDate = GetDate(), IsCurrent = 0

  When Not Matched By Target Then
    Insert (Key, ... , SCDStartDate, IsCurrent)
    Values (Source.Key, ..., GetDate(), 1)

  When Not Matched By Source Then
    Update Set SCDEndDate = GetDate(), IsCurrent = 0;

Merge Target
  Using Source
    On Target.Key = Source.Key

  -- These will be the changing rows that were expired in first statement.
  When Not Matched By Target Then
    Insert (Key, ... , SCDStartDate, IsCurrent)
    Values (Source.Key, ... , GetDate(), 1);

这篇关于需要帮助来了解SSIS中SCD的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 02:59