由于源数据集中的数据与目标数据集中的数据不兼容,我有时会收到错误消息.我想控制管道根据错误类型确定的操作,也许输出或删除那些特定的行,但还要完成其他所有操作.那可能吗?此外,是否有可能从Data Factory保留实际的故障线,而无需以某种简单的方式访问和搜索实际的源数据集?
I receive an error from time and time due to incompatible data in my source data set compared to my target data set. I would like to control the action that the pipeline determines based on error types, maybe output or drop those particulate rows, yet completing everything else. Is that possible? Furthermore, is it possible to get a hold of the actual failing line(s) from Data Factory without accessing and searching in the actual source data set in some simple way?
复制活动在接收器端遇到用户错误:ErrorCode = UserErrorInvalidDataValue,'Type = Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message =列'Timestamp'包含无效值'11667'.无法将'11667'转换为'DateTimeOffset'类型.,Source = Microsoft.DataTransfer.Common,''Type = System.FormatException,Message = String无法识别为有效的DateTime.,Source = mscorlib,'.
I think you've hit a fairly common problem and limitation within ADF. Although the datasets you define with your JSON allow ADF to understand the structure of the data, that is all, just the structure, the orchestration tool can't do anything to transform or manipulate the data as part of the activity processing.
To answer your question directly, it's certainly possible. But you need to break out the C# and use ADF's extensibility functionality to deal with your bad rows before passing it to the final destination.
I suggest you expand your data factory to include a custom activity where you can build some lower level cleaning processes to divert the bad rows as described.
我们经常采用这种方法,因为并非所有数据都是完美的(我希望如此),并且 ETL 或 ELT 不起作用.我更喜欢使用首字母缩写 ECLT . "C"代表干净的地方.或清理,准备等.这当然适用于ADF,因为此服务没有自己的计算或SSIS样式的数据流引擎.
This is an approach we often take as not all data is perfect (I wish) and ETL or ELT doesn't work. I prefer the acronym ECLT. Where the 'C' stands for clean. Or cleanse, prepare etc. This certainly applies to ADF because this service doesn't have its own compute or SSIS style data flow engine.
In terms of how to do this. First I recommend you check out this blog post on creating ADF custom activities. Link:
https://www. purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Then within your C# class inherited from IDotNetActivity
do something like the below.
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
using (StreamReader vReader = new StreamReader(YourSource))
using (StreamWriter vWriter = new StreamWriter(YourDestination))
while (!vReader.EndOfStream)
//data transform logic, if bad row etc
You get the idea. Build your own SSIS data flow!
Then write out your clean row as an output dataset, which can be the input for your next ADF activity. Either with multiple pipelines, or as chained activities within a single pipeline.
This is the only way you will get ADF to deal with your bad data in the current service offerings.