本文介绍了fit_transform() 需要 2 个位置参数,但 3 个是通过 LabelBinarizer 给出的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对机器学习完全陌生,并且一直在研究无监督学习技术.

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

图像显示了我的示例数据(所有清理后)屏幕截图:样本数据

Image shows my sample Data(After all Cleaning) Screenshot :Sample Data

我有这两个管道来清理数据:

I have this two Pipline built to Clean the Data:

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

print(type(num_attribs))

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ('imputer', Imputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer())
])

然后我将这两个管道合并,代码如下所示:

Then I did the union of this two pipelines and the code for the same is shown below :

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])

现在我正在尝试对 数据 执行 fit_transform 但它向我显示了错误.

Now I am trying to do fit_transform on the Data But Its showing Me the Error.

转换代码:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

错误信息:

fit_transform() 接受 2 个位置参数,但给出了 3 个

推荐答案

问题:

管道假设 LabelBinarizer 的 fit_transform 方法被定义为采用三个位置参数:

The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:

def fit_transform(self, x, y)
    ...rest of the code

虽然它被定义为只需要两个:

while it is defined to take only two:

def fit_transform(self, x):
    ...rest of the code

可能的解决方案:

这可以通过制作一个可以处理 3 个位置参数的自定义转换器来解决:

This can be solved by making a custom transformer that can handle 3 positional arguments:

  1. 导入并创建一个新类:

  1. Import and make a new class:

from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
    def __init__(self, *args, **kwargs):
        self.encoder = LabelBinarizer(*args, **kwargs)
    def fit(self, x, y=0):
        self.encoder.fit(x)
        return self
    def transform(self, x, y=0):
        return self.encoder.transform(x)

  • 只使用我们创建的类:MyLabelBinarizer(),而不是使用LabelBinarizer(),保持代码相同.

  • Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().

    注意:如果您想访问 LabelBinarizer 属性(例如 classes_),请将以下行添加到 fit 方法中:

        self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_
    

    这篇关于fit_transform() 需要 2 个位置参数,但 3 个是通过 LabelBinarizer 给出的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

  • 09-15 03:26