问题描述
我创建了一个Dask Dataframe(称为"df"),索引为"11"的列具有整数值:
I've created a Dask Dataframe (called "df") and the column with index "11" has integer values:
In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
int64
...
...
...
...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks
我正试图将其与:
df[11].sum()
我得到dd.Scalar<series-..., dtype=int64>
返回.尽管已经研究了这可能意味着什么,但是我仍然对为什么我没有得到一个数值返回值持怀疑态度.如何将其转换为数值?
I get dd.Scalar<series-..., dtype=int64>
returned. Despite researching what this might mean I'm still at odds as to why I'm not getting a numerical value returned. How can I translate this into its numerical value?
推荐答案
我认为您需要 compute
,用于告诉Dask
处理之前的所有内容:
I think you need compute
for telling Dask
to process everything that came before:
这会将懒惰的Dask集合变成其内存中的等效项.例如,Dask.array变成numpy.array(),而Dask.dataframe变成Pandas数据帧.调用此操作之前,整个数据集必须适合内存.
This turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.
df[11].sum().compute()
这篇关于Dask Dataframe列总和始终返回标量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!