本文介绍了从len 18000的Dask数据帧中采样n = 2000会产生错误当'replace = False'时,采样不能大于总体。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个从csv文件创建的dask数据帧,并且 len(daskdf)
返回18000,但是当我 ddSample = daskdf.sample(2000)时
我得到了错误
I have a dask dataframe created from a csv file and len(daskdf)
returns 18000 but when I ddSample = daskdf.sample(2000)
I get the error
ValueError: Cannot take a larger sample than population when 'replace=False'
如果数据框大于样本大小,是否可以不更换而进行采样?
Can I sample without replacement if the dataframe is larger than the sample size?
推荐答案
该示例方法仅支持 frac =
关键字参数。请参见
The sample method only supports the frac=
keyword argument. See the API documentation
您得到的错误来自熊猫,而不是达斯克。
The error that you're getting is from Pandas, not Dask.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [1]})
In [3]: df.sample(frac=2000, replace=False)
ValueError: Cannot take a larger sample than population when 'replace=False'
时,抽样不能大于总体
解决方案
如熊猫错误所示,请考虑对进行替换采样
In [4]: df.sample(frac=2, replace=True)
Out[4]:
x
0 1
0 1
In [5]: import dask.dataframe as dd
In [6]: ddf = dd.from_pandas(df, npartitions=1)
In [7]: ddf.sample(frac=2, replace=True).compute()
Out[7]:
x
0 1
0 1
这篇关于从len 18000的Dask数据帧中采样n = 2000会产生错误当'replace = False'时,采样不能大于总体。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!