Sklearn 管道:在 ColumnTransformer 中的 OneHotEncode 之后获取特征名称

本文介绍了Sklearn 管道:在 ColumnTransformer 中的 OneHotEncode 之后获取特征名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在拟合管道后获取特征名称.

I want to get feature names after I fit the pipeline.

categorical_features = ['brand', 'category_name', 'sub_category']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

numeric_features = ['num1', 'num2', 'num3', 'num4']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

然后

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('regressor', GradientBoostingRegressor())])

与熊猫数据框拟合后，我可以从

After fitting with pandas dataframe, I can get feature importances from

clf.steps[1][1].feature_importances_

我尝试了 clf.steps[0][1].get_feature_names() 但我遇到了错误

and I tried clf.steps[0][1].get_feature_names() but I got an error

AttributeError: Transformer num (type Pipeline) does not provide get_feature_names.

我如何从中获取功能名称?

How can I get feature names from this?

推荐答案

您可以使用以下代码段访问 feature_names！

You can access the feature_names using the following snippet!

clf.named_steps['preprocessor'].transformers_[1][1]\
   .named_steps['onehot'].get_feature_names(categorical_features)

使用sklearn >= 0.21版本，我们可以让它更简单:

Using sklearn >= 0.21 version, we can make it more simpler:

clf['preprocessor'].transformers_[1][1]['onehot']\
                   .get_feature_names(categorical_features)

可重现的例子:

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({'brand': ['aaaa', 'asdfasdf', 'sadfds', 'NaN'],
                   'category': ['asdf', 'asfa', 'asdfas', 'as'],
                   'num1': [1, 1, 0, 0],
                   'target': [0.2, 0.11, 1.34, 1.123]})

numeric_features = ['num1']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['brand', 'category']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('regressor',  LinearRegression())])
clf.fit(df.drop('target', 1), df['target'])

clf.named_steps['preprocessor'].transformers_[1][1]\
   .named_steps['onehot'].get_feature_names(categorical_features)

# ['brand_NaN' 'brand_aaaa' 'brand_asdfasdf' 'brand_sadfds' 'category_as'
#  'category_asdf' 'category_asdfas' 'category_asfa']

这篇关于Sklearn 管道:在 ColumnTransformer 中的 OneHotEncode 之后获取特征名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！