本文介绍了Dask Dataframe列总和始终返回标量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个Dask Dataframe(称为"df"),索引为"11"的列具有整数值:

I've created a Dask Dataframe (called "df") and the column with index "11" has integer values:

In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
    int64
      ...
    ...
      ...
      ...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks

我正试图将其与:

df[11].sum()

我得到dd.Scalar<series-..., dtype=int64>返回.尽管已经研究了这可能意味着什么,但是我仍然对为什么我没有得到一个数值返回值持怀疑态度.如何将其转换为数值?

I get dd.Scalar<series-..., dtype=int64> returned. Despite researching what this might mean I'm still at odds as to why I'm not getting a numerical value returned. How can I translate this into its numerical value?

推荐答案

我认为您需要 compute ,用于告诉Dask处理之前的所有内容:

I think you need compute for telling Dask to process everything that came before:

这会将懒惰的Dask集合变成其内存中的等效项.例如,Dask.array变成numpy.array(),而Dask.dataframe变成Pandas数据帧.调用此操作之前,整个数据集必须适合内存.

This turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.

df[11].sum().compute()

这篇关于Dask Dataframe列总和始终返回标量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 18:07