当使用这样的东西

clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X,y)
predictions = clf.predict_proba(X_test)


如何将预测仅限于一类?例如出于性能方面的考虑,这是必需的,例如,当我有成千上万个类,但仅对某个特定类是否具有高概率感兴趣时。

最佳答案

Sklearn没有实现它,例如,您将必须编写某种包装器-您可以extend KNeighborsClassifier类并重载predict_proba方法。

根据source code

 def predict_proba(self, X):
        """Return probability estimates for the test data X.

        Parameters
        ----------
        X : array, shape = (n_samples, n_features)
            A 2-D array representing the test points.

        Returns
        -------
        p : array of shape = [n_samples, n_classes], or a list of n_outputs
            of such arrays if n_outputs > 1.
            The class probabilities of the input samples. Classes are ordered
            by lexicographic order.
        """
        X = atleast2d_or_csr(X)

        neigh_dist, neigh_ind = self.kneighbors(X)

        classes_ = self.classes_
        _y = self._y
        if not self.outputs_2d_:
            _y = self._y.reshape((-1, 1))
            classes_ = [self.classes_]

        n_samples = X.shape[0]

        weights = _get_weights(neigh_dist, self.weights)
        if weights is None:
            weights = np.ones_like(neigh_ind)

        all_rows = np.arange(X.shape[0])
        probabilities = []
        for k, classes_k in enumerate(classes_):
            pred_labels = _y[:, k][neigh_ind]
            proba_k = np.zeros((n_samples, classes_k.size))

            # a simple ':' index doesn't work right
            for i, idx in enumerate(pred_labels.T):  # loop is O(n_neighbors)
                proba_k[all_rows, idx] += weights[:, i]

            # normalize 'votes' into real [0,1] probabilities
            normalizer = proba_k.sum(axis=1)[:, np.newaxis]
            normalizer[normalizer == 0.0] = 1.0
            proba_k /= normalizer

            probabilities.append(proba_k)

        if not self.outputs_2d_:
            probabilities = probabilities[0]

        return probabilities


只需修改代码,即可将for k, classes_k in enumerate(classes_):循环更改为所需的一个特定类的语法。

一种人工方法是覆盖classes_变量,使它成为所考虑类的单例,并在完成后将其还原。

关于machine-learning - 如何将预测概率限制为一类,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/18364065/

10-12 16:42