我正在尝试使用scikit-learn实现并行运行KMeans,但始终收到以下错误消息:

Traceback (most recent call last):
  File "run_kmeans.py", line 114, in <module>
    kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
    return_n_iter=True)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
    for seed in seeds)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683


当我使用n_jobs=1运行KMeans时,即以顺序方式运行时,我没有任何错误,一切都很好。但是,使用n_jobs=-1时,我总是收到错误消息。

这是我使用的代码:

kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)


descriptors是形状为(11683, 128)的numpy数组。



我是在做错什么还是KMeans实施中的错误?

我该怎么办(例如使用BiniBatchKMeans等)?

PS:我正在具有4 Gb RAM和Intel Core i7-4700HQ 2.40GHz的Ubuntu 16.04 64位计算机上运行它

最佳答案

可以通过将输入数据转换为float.64来解决此问题,该描述符为描述符.astype(np.float64)。

https://github.com/scikit-learn/scikit-learn/issues/8583

关于python - 运行并行KMeans时,“索引N超出大小为N的轴0的边界”,而顺序KMeans可以正常工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41635426/

10-12 23:32