问题描述
以下代码:
from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes'])
返回:
array([[1],
[0],
[0],
[1]])
但是,我希望每个班级只有一列:
However, I would like for there to be one column per class:
array([[1, 0],
[0, 1],
[0, 1],
[1, 0]])
(我需要这种格式的数据,所以我可以将其提供给在输出层使用softmax函数的神经网络)
(I need the data in this format so I can give it to a neural network that uses the softmax function at the output layer)
当有两个以上的类时,LabelBinarizer的行为符合预期:
When there are more than 2 classes, LabelBinarizer behaves as desired:
from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])
返回
array([[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0]])
上面,每个班级有1列.
Above, there is 1 column per class.
当有2个类时,是否有任何简单的方法来实现相同的目的(每个类1列)?
Is there any simple way to achieve the same (1 column per class) when there are 2 classes?
基于yangjie的回答,我编写了一个包装LabelBinarizer的类,以产生上述所需的行为: http://pastebin.com/UEL2dP62
Based on yangjie's answer I wrote a class to wrap LabelBinarizer to produce the desired behavior described above: http://pastebin.com/UEL2dP62
import numpy as np
from sklearn.preprocessing import LabelBinarizer
class LabelBinarizer2:
def __init__(self):
self.lb = LabelBinarizer()
def fit(self, X):
# Convert X to array
X = np.array(X)
# Fit X using the LabelBinarizer object
self.lb.fit(X)
# Save the classes
self.classes_ = self.lb.classes_
def fit_transform(self, X):
# Convert X to array
X = np.array(X)
# Fit + transform X using the LabelBinarizer object
Xlb = self.lb.fit_transform(X)
# Save the classes
self.classes_ = self.lb.classes_
if len(self.classes_) == 2:
Xlb = np.hstack((Xlb, 1 - Xlb))
return Xlb
def transform(self, X):
# Convert X to array
X = np.array(X)
# Transform X using the LabelBinarizer object
Xlb = self.lb.transform(X)
if len(self.classes_) == 2:
Xlb = np.hstack((Xlb, 1 - Xlb))
return Xlb
def inverse_transform(self, Xlb):
# Convert Xlb to array
Xlb = np.array(Xlb)
if len(self.classes_) == 2:
X = self.lb.inverse_transform(Xlb[:, 0])
else:
X = self.lb.inverse_transform(Xlb)
return X
事实证明,杨洁还写了一个新版本的LabelBinarizer,太棒了!
Edit 2: It turns out yangjie has also written a new version of LabelBinarizer, awesome!
推荐答案
我认为没有直接的方法可以做到这一点,尤其是当您想进行 inverse_transform
时.
I think there is no direct way to do it especially if you want to have inverse_transform
.
但是您可以使用numpy轻松构建标签
But you can use numpy to construct the label easily
In [18]: import numpy as np
In [19]: from sklearn.preprocessing import LabelBinarizer
In [20]: lb = LabelBinarizer()
In [21]: label = lb.fit_transform(['yes', 'no', 'no', 'yes'])
In [22]: label = np.hstack((label, 1 - label))
In [23]: label
Out[23]:
array([[1, 0],
[0, 1],
[0, 1],
[1, 0]])
然后您可以通过切片第一列来使用 inverse_transform
Then you can use inverse_transform
by slicing the first column
In [24]: lb.inverse_transform(label[:, 0])
Out[24]:
array(['yes', 'no', 'no', 'yes'],
dtype='<U3')
基于上述解决方案,您可以编写一个继承 LabelBinarizer
的类,从而使二进制和多类情况的操作和结果均保持一致.
Based on the above solution, you can write a class that inherits LabelBinarizer
, which makes the operations and results consistent for both binary and multiclass case.
from sklearn.preprocessing import LabelBinarizer
import numpy as np
class MyLabelBinarizer(LabelBinarizer):
def transform(self, y):
Y = super().transform(y)
if self.y_type_ == 'binary':
return np.hstack((Y, 1-Y))
else:
return Y
def inverse_transform(self, Y, threshold=None):
if self.y_type_ == 'binary':
return super().inverse_transform(Y[:, 0], threshold)
else:
return super().inverse_transform(Y, threshold)
然后
lb = MyLabelBinarizer()
label1 = lb.fit_transform(['yes', 'no', 'no', 'yes'])
print(label1)
print(lb.inverse_transform(label1))
label2 = lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])
print(label2)
print(lb.inverse_transform(label2))
给予
[[1 0]
[0 1]
[0 1]
[1 0]]
['yes' 'no' 'no' 'yes']
[[0 0 1]
[0 1 0]
[0 1 0]
[0 0 1]
[1 0 0]]
['yes' 'no' 'no' 'yes' 'maybe']
这篇关于当有2个类时,sklearn LabelBinarizer返回向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!