本文介绍了OneHotEncoder 的功能名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 OneHotEncoder 对几个分类变量进行编码(例如 - 性别和年龄组).编码器生成的特征名称类似于 - 'x0_female'、'x0_male'、'x1_0.0'、'x1_15.0' 等

>>>train_X = pd.DataFrame({'Sex':['male', 'female']*3,'AgeGroup':[0,15,30,45,60,75]})>>>从 sklearn.preprocessing 导入 OneHotEncoder>>>编码器 = OneHotEncoder()>>>train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])

>>>编码器.get_feature_names()>>>数组(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0','x1_60.0', 'x1_75.0'], dtype=object)

有没有办法告诉 OneHotEncoder 以在开头添加列名的方式创建特征名称,例如 - Sex_female、AgeGroup_15.0 等,类似于 Pandasget_dummies() 可以.

解决方案

您可以将带有原始列名的列表传递给 get_feature_names:

encoder.get_feature_names(['Sex', 'AgeGroup'])

将返回:

['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15','年龄组_30'、'年龄组_45'、'年龄组_60'、'年龄组_75']

I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc.

>>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]})

>>> from sklearn.preprocessing import OneHotEncoder
>>> encoder = OneHotEncoder()
>>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])
>>> encoder.get_feature_names()
>>> array(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0',
       'x1_60.0', 'x1_75.0'], dtype=object)

Is there a way to tell OneHotEncoder to create the feature names in such a way that the column name is added at the beginning, something like - Sex_female, AgeGroup_15.0 etc, similar to what Pandas get_dummies() does.

解决方案

You can pass the list with original column names to get_feature_names:

encoder.get_feature_names(['Sex', 'AgeGroup'])

will return:

['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
 'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75']

这篇关于OneHotEncoder 的功能名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 07:21