问题描述
我首先要说的是,能够从一个平面文件中提取1700万条记录,将其推到远程设备上的数据库中,并花费7分钟的时间,真是太神奇了. SSIS确实很棒.但是现在我已经有了这些数据,如何删除重复项?
Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantastic. But now that I have that data up there, how do I remove duplicates?
更好的是,我想获取平面文件,从平面文件中删除重复项,然后将其放回另一个平面文件中.
Better yet, I want to take the flat file, remove the duplicates from the flat file and put them back into another flat file.
我在考虑一个问题:
Data Flow Task
Data Flow Task
- 文件源(具有关联的文件连接)
- 一个for循环容器
- 一个脚本容器,其中包含一些逻辑来判断是否存在另一行
您好,此网站上的每个人都知识渊博.
Thak you, and everyone on this site is incredibly knowledgeable.
Update:
我找到了此链接,可能有助于回答这个问题
推荐答案
使用排序组件.
只需选择要对加载的行进行排序的字段,然后在左下角看到一个复选框,以删除重复项.此框将删除仅基于排序条件重复的所有行因此,在下面的示例中,如果我们仅对第一个字段进行排序,则这些行将被视为重复行:
Simply choose which fields you wish to sort your loaded rows by and in the bottom left corner you'll see a check box to remove duplicates. This box removes any rows which are duplicates based on the sort criteria onlyso in the example below the rows would be considered duplicate if we only sorted on the first field:
1 | sample A |
1 | sample B |
这篇关于如何使用SSIS从平面文件中删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!