问题描述
我试图在Azure数据工厂中使用获取元数据"活动,以获取blob文件名并将其复制到Azure SQL数据库表中.我遵循此教程:
对于我的复制数据>Sink,其Azure SQL数据库,我启用了选项自动创建表"
在Sink数据集配置中,我不得不选择一个表,因为如果我不选择SQL数据库中的表,即使该表与我想要的Blob文件名根本不相关,验证也不会通过得到.
问题1:我是否应该在SQL DB中创建一个新表,然后使列与要提取的blob文件名匹配?
然后,我尝试验证管道,但出现此错误.
Copy_Data_1当源是二进制数据集时,接收器必须是二进制的.
问题2:如何解决此错误?我必须选择源文件的文件类型为二进制,因为这是创建源数据集时的步骤之一.因此,当我选择作为Azure SQL表的接收器数据集时,不必选择数据集的类型,因此看起来似乎不匹配.
非常感谢您.
新管道的新屏幕截图,我现在可以在json输出文件中获取文件名的itemName.
现在,我在Get_File_Name2活动之后添加复制数据"活动,并将它们连接在一起,以尝试获取json输出文件作为源数据集.
但是,我需要先选择源数据集位置,然后再将类型指定为json.但是,据我了解,这些输出json文件是Get_File_Name2活动的输出,它们尚未存储在Blob存储中.如何使复制数据活动读取这些json输出文件作为源数据集?
更新10/14/2020 这是我的新活动存储过程,我按照建议添加了参数,但是我将名称更改为JsonData,因为我的存储过程需要此参数.
这是我的存储过程.
在存储过程中出现此错误:
{"errorCode":"2402","message":针对sql server执行失败.SQL错误号:13609.错误消息:JSON文本格式不正确.在位置0找到了意外的字符'S'.","failureType":"UserError","target":存储过程1",详细信息":[]}
但是当我检查输入时,似乎已经成功读取了json字符串itemName.
但是,当我检查输出时,它不存在.
实际上,您可以使用获取元数据输出json"作为参数,然后调用存储过程: Get metedata
->存储过程
!
您只需要关注存储过程的编码即可.
获取元数据输出子项
:
{"childItems":[{名称":"DeploymentFiles.zip",类型":文件"},{名称":"geodatalake.pdf",类型":文件"},{名称":"test2.xlsx",类型":文件"},{名称":"word.csv",类型":文件"}}
存储过程:
@activity('Get Metadata1').output.childitems
关于如何创建存储过程(从json对象获取数据),您可以引用此博客:使用存储过程从SQL Server检索JSON数据.
I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table.I follow this tutorial: https://www.mssqltips.com/sqlservertip/6246/azure-data-factory-get-metadata-example/
Here is my pipeline, Copy Data > Source is the source destination of the blob files in my Blob storage. I need to specify my source file as binary because they are *.jpeg files.
For my Copy Data > Sink, its the Azure SQL database, I enable the option "Auto Create table"
In my Sink dataset config, I had to choose one table because the validation won't pass if I don't select the table in my SQL database even though this table is not related at all to the blob filenames that I want to get.
Question 1: Am I supposed to create a new table in SQL DB before to have the columns matching the blob filenames that I want to extract?
Then, I tried to validate the pipeline and I get this error.
Copy_Data_1
Sink must be binary when source is binary dataset.
Question 2: How can I resolve this error? I had to select the file type of the source as binary as it's one of the step when creating source dataset. Therefore, when I choose sink dataset that is Azure SQL table, I didn't have to select the type of dataset so it doesn't seem to match.
Thank you very much in advance.
New screenshot of the new pipeline, I can now get itemName of filenames in the json output files.
Now I add Copy Data activity just after Get_File_Name2 activity and connect them together to try to get the json output files as source dataset.
However, I need to choose the source dataset location first before specify type as json. But, as far as I understand these output json files are the output from Get_File_Name2 activity and they are not yet stored on Blob storage. How do I make the copy data activity reading these json output file as source dataset?
Update 10/14/2020Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.
This is my stored procedure.
I get this error at the stored procedure:
{
"errorCode": "2402",
"message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
"failureType": "UserError",
"target": "Stored procedure1",
"details": []
}
But when I check the input, it seems like it already successfully reading the json string itemName.
But, when I check output, it's not there.
Actually, you may could using Get metadata output json as the parameter and then call the stored procedure: Get metedata
-->Stored Procedure
!
You just need focus on the coding of the stored procedure.
Get Metadata output childitems
:
{
"childItems": [
{
"name": "DeploymentFiles.zip",
"type": "File"
},
{
"name": "geodatalake.pdf",
"type": "File"
},
{
"name": "test2.xlsx",
"type": "File"
},
{
"name": "word.csv",
"type": "File"
}
}
Stored Procedure:
@activity('Get Metadata1').output.childitems
About how to create the stored procedure(get data from json object), you could ref this blog: Retrieve JSON Data from SQL Server using a Stored Procedure.
这篇关于Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!