本文介绍了Sklearn 管道:在 ColumnTransformer 中的 OneHotEncode 之后获取特征名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在拟合管道后获取特征名称.
I want to get feature names after I fit the pipeline.
categorical_features = ['brand', 'category_name', 'sub_category']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
numeric_features = ['num1', 'num2', 'num3', 'num4']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
然后
clf = Pipeline(steps=[('preprocessor', preprocessor),
('regressor', GradientBoostingRegressor())])
与熊猫数据框拟合后,我可以从
After fitting with pandas dataframe, I can get feature importances from
clf.steps[1][1].feature_importances_
我尝试了 clf.steps[0][1].get_feature_names()
但我遇到了错误
and I tried clf.steps[0][1].get_feature_names()
but I got an error
AttributeError: Transformer num (type Pipeline) does not provide get_feature_names.
我如何从中获取功能名称?
How can I get feature names from this?
推荐答案
您可以使用以下代码段访问 feature_names!
You can access the feature_names using the following snippet!
clf.named_steps['preprocessor'].transformers_[1][1]\
.named_steps['onehot'].get_feature_names(categorical_features)
使用sklearn >= 0.21版本,我们可以让它更简单:
Using sklearn >= 0.21 version, we can make it more simpler:
clf['preprocessor'].transformers_[1][1]['onehot']\
.get_feature_names(categorical_features)
可重现的例子:
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
df = pd.DataFrame({'brand': ['aaaa', 'asdfasdf', 'sadfds', 'NaN'],
'category': ['asdf', 'asfa', 'asdfas', 'as'],
'num1': [1, 1, 0, 0],
'target': [0.2, 0.11, 1.34, 1.123]})
numeric_features = ['num1']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features = ['brand', 'category']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
clf = Pipeline(steps=[('preprocessor', preprocessor),
('regressor', LinearRegression())])
clf.fit(df.drop('target', 1), df['target'])
clf.named_steps['preprocessor'].transformers_[1][1]\
.named_steps['onehot'].get_feature_names(categorical_features)
# ['brand_NaN' 'brand_aaaa' 'brand_asdfasdf' 'brand_sadfds' 'category_as'
# 'category_asdf' 'category_asdfas' 'category_asfa']
这篇关于Sklearn 管道:在 ColumnTransformer 中的 OneHotEncode 之后获取特征名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!