本文介绍了将数据从数据库移至Azure Blob存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我可以使用 dask.dataframe .read_sql_table 读取数据,例如df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N)
I'm able to use dask.dataframe.read_sql_table to read the data e.g. df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N)
将其保存为Azure Blob存储中的拼花文件的下一步(最佳)是什么?
What would be the next (best) steps to saving it as a parquet file in Azure blob storage?
根据我的小型研究,有两种选择:
From my small research there are a couple of options:
- 本地保存并使用"> https://docs.microsoft.com/zh-CN/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json (不适用于大数据)
- 我相信 adlfs 是从blob中读取的
- 使用 dask.dataframe.to_parquet 并弄清楚如何指向Blob容器
- intake 项目(不确定从何处开始)
- Save locally and use https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json (not great for big data)
- I believe adlfs is to read from blob
- use dask.dataframe.to_parquet and work out how to point to the blob container
- intake project (not sure where to start)
推荐答案
$ pip install adlfs
dd.to_parquet(
df=df,
path='absf://{BLOB}/{FILE_NAME}.parquet',
storage_options={'account_name': 'ACCOUNT_NAME',
'account_key': 'ACCOUNT_KEY'},
)
这篇关于将数据从数据库移至Azure Blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!