本文介绍了为什么自定义指标的 KNN 很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我处理的数据集包含大约 20 万个对象.每个对象都有 4 个特征.我用欧几里德度量通过 K 个最近邻 (KNN) 对它们进行分类.过程在大约 20 秒内完成.

I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds.

最近我有理由使用自定义指标.也许它会产生更好的结果.我已经实施了自定义指标,KNN 已经工作了一个多小时.我没有等到它完成.

Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it.

我认为这个问题的一个原因是我的指标.我用 return 1 替换我的代码.KNN 仍然工作了一个多小时.我假设一个原因是算法 Ball Tree,但使用它的 KNN 和欧几里德度量在大约 20 秒内起作用.

I assumed that a reason of this issue is my metric. I replace my code by return 1. KNN still worked more than one hour. I assumed that a reason is algorithm Ball Tree, but KNN with it and euclidean metric works during about 20 seconds.

现在我不知道出了什么问题.我使用 Python 3 和 sklearn 0.17.1.这里流程可以' 没有完成自定义指标.我也试过算法 brute 但它有同样的效果.scikit-learn 的升级和降级没有影响.在 Python 2 上通过自定义指标实现分类也没有积极影响.我在 Cython 上实现了这个指标(只返回 1),它有同样的效果.

Right now I have no idea what's wrong. I use Python 3 and sklearn 0.17.1. Here process can't be finished with custom metric. I also tried algorithm brute but it has same effect. Upgrade and downgrade of scikit-learn have no effect. Implementing classification by custom metric on Python 2 has no positive effect too. I implemented this metric (just return 1) on Cython, it has same effect.

def custom_metric(x: np.ndarray, y: np.ndarray) -> float:
    return 1

clf = KNeighborsClassifier(n_jobs=1, metric=custom_metric)
clf.fit(X, Y)

我可以使用自定义指标通过 KNN 提升分类过程吗?

Can I boost classification process by KNN with custom metric?

对不起,如果我的英语不清楚.

Sorry if my english is not clear.

推荐答案

Sklearn 经过优化并使用 cython 和几个进程尽可能快地运行.编写纯 python 代码,尤其是当它被多次调用时,是导致代码变慢的原因.我建议您使用 cython 编写自定义指标.你有一个教程,你可以在这里学习:https://blog.sicara.com/https-medium-com-redaboumahdi-speed-sklearn-algorithms-custom-metrics-using-cython-de92e5a325c

Sklearn is optimized and use cython and several process to run as fast as possible. Writing pure python code especially when it is called several times is the cause that slows your code. I recommend that you write your custom metric using cython.You have a tutorial that you can follow right here : https://blog.sicara.com/https-medium-com-redaboumahdi-speed-sklearn-algorithms-custom-metrics-using-cython-de92e5a325c

这篇关于为什么自定义指标的 KNN 很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 07:24