Dask:定期更新已发布的数据集并从其他客户端提取数据

本文介绍了Dask:定期更新已发布的数据集并从其他客户端提取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将数据从队列(如redis)追加到published dask dataset上.然后其他python程序将能够获取最新数据(例如，每秒/分钟一次)并执行一些进一步的操作.

I would like to append data on a published dask dataset from a queue (like redis). Then other python programs would be able to fetch the latest data (e.g. once per second/minute) and do some futher opertions.

有可能吗?
应使用哪个附加接口?我应该先将其加载到pd.DataFrame还是更好地使用一些文本导入器?
假定的追加速度是多少?是否可以每秒添加1k/10k行?
是否还有其他好的建议可以在dask集群中交换庞大且快速更新的数据集?

Would that be possible?
Which append interface should be used? Should I load it into a pd.DataFrame first or better use some text importer?
What are the assumed append speeds? Is it possible to append lets say 1k/10k rows in a second?
Are there other good suggestions to exchange huge and rapidly updating datasets within a dask cluster?

感谢任何提示和建议.

dask

Dask:定期更新已发布的数据集并从其他客户端提取数据

问题描述

推荐答案