问题描述
我试图在Azure数据工厂中使用获取元数据"活动,以获取blob文件名并将其复制到Azure SQL数据库表中.我在获取元数据"活动之后添加了存储过程活动.这是我的新活动存储过程,我按照建议添加了参数,但是我将名称更改为JsonData,因为我的存储过程需要此参数.
I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table.I added the stored procedure activity after Get Metadata activity. Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.
这是我的存储过程.
/****** Object: StoredProcedure [dbo].[InsertDataJSON] Script Date: 10/14/2020 11:01:30 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*USE datafactorypoc1*/
ALTER PROCEDURE [dbo].[InsertDataJSON] (
@JsonData NVARCHAR (MAX)
)
AS
BEGIN
DECLARE @err int
INSERT INTO extractFileNameTest1 (ItemName, ItemType, EffIntegrationRunTieme, ExecutionDuration, DurationInQueue)
SELECT ItemName, ItemType, EffIntegrationRunTieme, ExecutionDuration, DurationInQueue
FROM OPENJSON (@JsonData, N'$') WITH (
ItemName VARCHAR(255) N'$.ItemName',
ItemType VARCHAR(255) N'$.ItemType',
EffIntegrationRunTieme VARCHAR(255) N'$.EffIntegrationRunTieme',
ExecutionDuration INT N'$.ExecutionDuration',
DurationInQueue INT N'$.DurationInQueue'
)
SELECT @err = @@ERROR
RETURN (@err)
END
在存储过程中出现此错误:
I get this error at the stored procedure:
{
"errorCode": "2402",
"message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
"failureType": "UserError",
"target": "Stored procedure1",
"details": []
}
但是当我检查输入时,似乎已经成功读取了json字符串itemName.
But when I check the input, it seems like it already successfully reading the json string itemName.
但是,当我检查输出时,它不存在.
But, when I check output, it's not there.
您能帮我检查一下我在这里做错了什么吗?是我的存储过程吗?提前非常感谢您.
Could you please help me check what I did wrong here? Is it my stored procedure? Thank you very much in advance.
更新15/10/2020 我创建了一个新管道,并在ForEach活动中移动了存储过程以连接到Get_Filename_2,而不是紧随其后的第一个广告:
Update 15/10/2020I created a new pipeline and move the Stored Procedure inside ForEach activity to connect to Get_Filename_2 instead of the first one ad followed:
在这里,我将参数的值更改为Get_Filename_2并作为itemName而不是childitems输出(因为使用childitems时出现错误,因为childitems来自Get_Filename_1而不是2).
Here I changed the value of parameter to Get_Filename_2 and output as itemName instead of childitems (because I got an error with using childitems because childitems is from Get_Filename_1 and not 2).
执行管道(失败)后,最后一个存储过程的输入为:
After executing the pipeline (which was failed), the input of the last stored procedure is:
{
"storedProcedureName": "[dbo].[InsertDataJSON]",
"storedProcedureParameters": {
"JsonData": {
"value": "FRRNSC84FIN1_A2276801_20200103-152534.json",
"type": "String"
}
}
}
输出为:
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
对于此管道,它失败并显示以下错误消息.
For this pipeline, it's failed with following error message.
{
"errorCode": "2402",
"message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'F' is found at position 0.\r\nFRRNSC84FIN1_A2276801_20200103-152534.json",
"failureType": "UserError",
"target": "Stored procedure1",
"details": []
}
在我的旧管道中,我在ForEach循环之外存储了过程.管道没有失败:
On my old pipeline where I have stored procedure outside of ForEach loop. The pipeline did not fail:
这是上一个存储过程活动的输入:
Here is the input of the last stored procedure activity:
{
"storedProcedureName": "[dbo].[InsertDataJSON]",
"storedProcedureParameters": {
"JsonData": {
"value": "[{\"name\":\"FRRNSC84FIN1_A2274001_20200103-143748_back_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2274001_20200103-143748_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2274801_20200103-144811_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2275201_20200103-145229_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2276801_20200103-152534.json\",\"type\":\"File\"}]"
}
}
}
这是输出:
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
此管道成功运行,但是SQL中的结果不是我真正期望的,在第一个json字符串之前也包含许多NULL值,但是它将所有itemName插入相同的位置.这是因为我认为它不在ForEach循环之外.但是为什么要插入这么多NULL?
This pipeline ran successfully however the result in the SQL is not what I really expected, also many NULL values before the first json string but it insert all itemName in the same location. This is because it's outside of ForEach loop I think. But why it insert so many NULL?
这是我修改的存储过程:
Here is my stored procedure that I modified:
/****** Object: StoredProcedure [dbo].[InsertDataJSON] Script Date: 15/10/2020 10:31:51 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*USE datafactorypoc1*/
ALTER PROCEDURE [dbo].[InsertDataJSON] (
@JsonData NVARCHAR (MAX)
)
AS
BEGIN
PRINT @JsonData
/*INSERT INTO Logs values (DEFAULT, @JsonData)*/
INSERT INTO extractFileNameTest1 values (@JsonData, DEFAULT)
SELECT * FROM
OPENJSON (@JsonData)
WITH (
ItemName VARCHAR(255) N'$.ItemName',
ItemType VARCHAR(255) N'$.ItemType'
)
END
我试图删除测试SQL表中的所有数据,以查看第二条故障管道的输出.我注意到它确实正确解析了表中的json字符串文件名以及所有文件(我在blob存储中只有5个文件).但是其他数据为NULL.
I tried to delete all data in my test SQL table to see the output of the 2nd fail pipeline. I notice that it did parse correctly the json string filename inside my table and all of them (I only have 5 files in blob storage). But other data are NULL.
总而言之,新管道的SQL表中的结果要好得多,每个文件名都在表的不同单元格上,但是管道运行失败.您能帮我检查一下我做错了什么吗?是存储过程还是ForEach循环中我上一次存储过程活动中的表达式?
To conclude, the result in SQL table of the new pipeline is much better, each file name on the different cell of the table but I got pipeline run failed. Could you please help me check what I did wrong here? Is it the stored procedure or the expression in my last Stored procedure activity inside ForEach loop?
非常感谢您.
推荐答案
在这种情况下,我仅使用1 Get Metadata活动:获取元数据活动的数据集= Blob存储中的二进制文件获取元数据的字段列表=子项
For this scenario, I use only 1 Get Metadata activity with:Dataset of Get Metadata activity = Binary files from Blob storageField List of Get Metadata = Child items
此获取元数据"活动的输出已连接到ForEach活动:ForEach活动设置=>项目=>@activity('Get_FileName_1').output.childItems
This Get Metadata activity has output connected to ForEach activity:ForEach activity settings => Items => @activity('Get_FileName_1').output.childItems
在ForEach活动中,有1个活动是存储过程:存储过程设置已将链接的服务= Azure SQL数据库"与选定的存储过程链接在一起.在我编写的存储过程中,我定义了将从blob文件名读取的列.然后,对于存储过程"活动的参数,我定义了完全相同的参数编号=存储过程中@JsonData的所有列.然后,对于参数的每个值,我使用Azure数据工厂函数@substring提取blob文件名的不同部分.然后将它们插入到Azure SQL表的正确列中.
Inside ForEach activity, there is 1 activity which is Stored Procedure:Stored Procedure Settings has Linked Service = Azure SQL Database with selected Stored Procedure. In the stored procedure that I wrote, I defined the columns that I will read from blob file names. Then, for the parameters of Stored Procedure activity, I define exact same parameters numbers = all columns of @JsonData in stored procedure.Then, for each value of the parameter, I use Azure Data Factory function @substring to extract different parts of blob file name. Then insert these to the correct columns in Azure SQL table.
Source for Substring function: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#substring
这篇关于Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表第2部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!