本文介绍了将数据从数据库移至Azure Blob存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用 dask.dataframe .read_sql_table 读取数据,例如df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N)

I'm able to use dask.dataframe.read_sql_table to read the data e.g. df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N)

将其保存为Azure Blob存储中的拼花文件的下一步(最佳)是什么?

What would be the next (best) steps to saving it as a parquet file in Azure blob storage?

根据我的小型研究,有两种选择:

From my small research there are a couple of options:

  • Save locally and use https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json (not great for big data)
  • I believe adlfs is to read from blob
  • use dask.dataframe.to_parquet and work out how to point to the blob container
  • intake project (not sure where to start)

推荐答案

$ pip install adlfs

dd.to_parquet(
    df=df,
    path='absf://{BLOB}/{FILE_NAME}.parquet',
    storage_options={'account_name': 'ACCOUNT_NAME',
                     'account_key': 'ACCOUNT_KEY'},
    )

这篇关于将数据从数据库移至Azure Blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-10 22:34