Lake存储中的现有文件

Lake存储中的现有文件

本文介绍了通过REST API将数据追加到Azure Data Lake存储中的现有文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经设置了从REST API提取数据并将其放入ADLS存储gen1的管道,我还看到了生成的文件

I have set up pipeline that fetch data from REST API and drops it down into ADLS storage gen1 , I am also seeing the files generated

REST API> ADF管道(获取承载令牌+复制活动)> ADLS

但是当从该API传入新数据时,数据将替换该文件中的当前内容,而不是每次都在最后一行附加

But when new data comes in from that API , data is replacing the current content in that file instead of appending at the last line every time

我需要提供任何动态操作或其他方法吗?有人可以让我朝正确的方向前进吗.

is there any dynamic action that i need to provide or something ? can someone please put me in right direction .

注意:我可以看到文件中的内容,完全没有错误

Note: i can able to see the content inside the file , no errors at all

推荐答案

考虑到Blob存储的本质,我认为使用标准的Copy活动是不可能的.Azure Blob有几种类型,其中最常见的是BlockBlob,它几乎可以肯定是ADF操作生成的类型.一个BlockBlob不能被更改或更新,只能被覆盖,这可以解释您所遇到的行为.为了更新Blob的内容,必须将其定义为AppendBlob,该AppendBlob允许添加新内容.创建Blob时必须声明AppendBlob类型.

Given the nature of Blob storage, I don't think this is possible with a standard Copy activity. Azure Blobs have several types, the most common of which is BlockBlob, which is almost surely the type generated by ADF operations. A BlockBlob cannot be changed or updated, only overwritten, which explains the behavior you are experiencing. In order to update the content of a blob, it must be defined as an AppendBlob, which permits adding new content. The AppendBlob type must be declared when the Blob is created.

我知道完成此操作的唯一方法(创建AppendBlob并向其中添加内容)是通过Azure存储SDK,其中包括专门用于处理AppendBlob的类和方法.该操作将需要一些自定义代码(我将假设使用C#控制台应用程序)来访问存储帐户并追加到Blob.为了将其合并到您的管道中,您将需要用自定义"活动替换复制"活动,该活动将在Azure Batch帐户中执行C#代码.

The only way I know to accomplish this (both creating the AppendBlob and appending content to it) is via the Azure Storage SDK, which includes classes and methods specific to dealing with AppendBlobs. This operation would require some custom code (I'll assume a C# console app) to access the storage account and append to the Blob. In order to incorporate this into your pipeline, you will need to replace your Copy activity with a Custom activity, which will execute your C# code in an Azure Batch account.

要考虑的另一种方法是允许复制"活动每次都生成一个新的Blob,而不是尝试使用单个Blob.通常,数据湖和MPP旨在一次处理多个文件,因此根据您的用例,这可能是一种更合理的方法.

An alternative to consider is permitting the Copy activity to generate a new blob every time instead of trying to work with a single Blob. Data lakes and MPPs in general are designed to work with many files at a time, so depending on your use case that may be a more reasonable approach.

这篇关于通过REST API将数据追加到Azure Data Lake存储中的现有文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 20:28