本文介绍了从len 18000的Dask数据帧中采样n = 2000会产生错误当'replace = False'时,采样不能大于总体。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从csv文件创建的dask数据帧,并且 len(daskdf)返回18000,但是当我 ddSample = daskdf.sample(2000)时我得到了错误

I have a dask dataframe created from a csv file and len(daskdf) returns 18000 but when I ddSample = daskdf.sample(2000) I get the error

ValueError: Cannot take a larger sample than population when 'replace=False'

如果数据框大于样本大小,是否可以不更换而进行采样?

Can I sample without replacement if the dataframe is larger than the sample size?

推荐答案

该示例方法仅支持 frac = 关键字参数。请参见

The sample method only supports the frac= keyword argument. See the API documentation

您得到的错误来自熊猫,而不是达斯克。

The error that you're getting is from Pandas, not Dask.

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [1]})
In [3]: df.sample(frac=2000, replace=False)
ValueError: Cannot take a larger sample than population when 'replace=False'



时,抽样不能大于总体

解决方案



如熊猫错误所示,请考虑对进行替换采样

In [4]: df.sample(frac=2, replace=True)
Out[4]: 
   x
0  1
0  1

In [5]: import dask.dataframe as dd
In [6]: ddf = dd.from_pandas(df, npartitions=1)
In [7]: ddf.sample(frac=2, replace=True).compute()
Out[7]: 
   x
0  1
0  1

这篇关于从len 18000的Dask数据帧中采样n = 2000会产生错误当'replace = False'时,采样不能大于总体。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 01:01