当使用这样的东西
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X,y)
predictions = clf.predict_proba(X_test)
如何将预测仅限于一类?例如出于性能方面的考虑,这是必需的,例如,当我有成千上万个类,但仅对某个特定类是否具有高概率感兴趣时。
最佳答案
Sklearn没有实现它,例如,您将必须编写某种包装器-您可以extend
KNeighborsClassifier
类并重载predict_proba
方法。
根据source code
def predict_proba(self, X):
"""Return probability estimates for the test data X.
Parameters
----------
X : array, shape = (n_samples, n_features)
A 2-D array representing the test points.
Returns
-------
p : array of shape = [n_samples, n_classes], or a list of n_outputs
of such arrays if n_outputs > 1.
The class probabilities of the input samples. Classes are ordered
by lexicographic order.
"""
X = atleast2d_or_csr(X)
neigh_dist, neigh_ind = self.kneighbors(X)
classes_ = self.classes_
_y = self._y
if not self.outputs_2d_:
_y = self._y.reshape((-1, 1))
classes_ = [self.classes_]
n_samples = X.shape[0]
weights = _get_weights(neigh_dist, self.weights)
if weights is None:
weights = np.ones_like(neigh_ind)
all_rows = np.arange(X.shape[0])
probabilities = []
for k, classes_k in enumerate(classes_):
pred_labels = _y[:, k][neigh_ind]
proba_k = np.zeros((n_samples, classes_k.size))
# a simple ':' index doesn't work right
for i, idx in enumerate(pred_labels.T): # loop is O(n_neighbors)
proba_k[all_rows, idx] += weights[:, i]
# normalize 'votes' into real [0,1] probabilities
normalizer = proba_k.sum(axis=1)[:, np.newaxis]
normalizer[normalizer == 0.0] = 1.0
proba_k /= normalizer
probabilities.append(proba_k)
if not self.outputs_2d_:
probabilities = probabilities[0]
return probabilities
只需修改代码,即可将
for k, classes_k in enumerate(classes_):
循环更改为所需的一个特定类的语法。一种人工方法是覆盖
classes_
变量,使它成为所考虑类的单例,并在完成后将其还原。关于machine-learning - 如何将预测概率限制为一类,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/18364065/