问题描述
我对机器学习完全陌生,并且一直在研究无监督学习技术.
I am totally new to Machine Learning and I have been working with unsupervised learning technique.
图像显示了我的示例数据(所有清理后)屏幕截图:样本数据
Image shows my sample Data(After all Cleaning) Screenshot :Sample Data
我有这两个管道来清理数据:
I have this two Pipline built to Clean the Data:
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
print(type(num_attribs))
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])
然后我将这两个管道合并,代码如下所示:
Then I did the union of this two pipelines and the code for the same is shown below :
from sklearn.pipeline import FeatureUnion
full_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
现在我正在尝试对 数据 执行 fit_transform 但它向我显示了错误.
Now I am trying to do fit_transform on the Data But Its showing Me the Error.
转换代码:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
错误信息:
fit_transform() 接受 2 个位置参数,但给出了 3 个
推荐答案
问题:
管道假设 LabelBinarizer 的 fit_transform
方法被定义为采用三个位置参数:
The pipeline is assuming LabelBinarizer's fit_transform
method is defined to take three positional arguments:
def fit_transform(self, x, y)
...rest of the code
虽然它被定义为只需要两个:
while it is defined to take only two:
def fit_transform(self, x):
...rest of the code
可能的解决方案:
这可以通过制作一个可以处理 3 个位置参数的自定义转换器来解决:
This can be solved by making a custom transformer that can handle 3 positional arguments:
导入并创建一个新类:
Import and make a new class:
from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
def __init__(self, *args, **kwargs):
self.encoder = LabelBinarizer(*args, **kwargs)
def fit(self, x, y=0):
self.encoder.fit(x)
return self
def transform(self, x, y=0):
return self.encoder.transform(x)
只使用我们创建的类:MyLabelBinarizer(),而不是使用LabelBinarizer(),保持代码相同.
Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().
注意:如果您想访问 LabelBinarizer 属性(例如 classes_),请将以下行添加到 fit
方法中:
self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_
这篇关于fit_transform() 需要 2 个位置参数,但 3 个是通过 LabelBinarizer 给出的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!