问题描述
我使用 OneHotEncoder 对几个分类变量进行编码(例如 - 性别和年龄组).编码器生成的特征名称类似于 - 'x0_female'、'x0_male'、'x1_0.0'、'x1_15.0' 等
>>>train_X = pd.DataFrame({'Sex':['male', 'female']*3,'AgeGroup':[0,15,30,45,60,75]})>>>从 sklearn.preprocessing 导入 OneHotEncoder>>>编码器 = OneHotEncoder()>>>train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])>>>编码器.get_feature_names()>>>数组(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0','x1_60.0', 'x1_75.0'], dtype=object)
有没有办法告诉 OneHotEncoder
以在开头添加列名的方式创建特征名称,例如 - Sex_female、AgeGroup_15.0 等,类似于 Pandasget_dummies()
可以.
您可以将带有原始列名的列表传递给 get_feature_names
:
encoder.get_feature_names(['Sex', 'AgeGroup'])
将返回:
['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15','年龄组_30'、'年龄组_45'、'年龄组_60'、'年龄组_75']
I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc.
>>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]})
>>> from sklearn.preprocessing import OneHotEncoder
>>> encoder = OneHotEncoder()
>>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])
>>> encoder.get_feature_names()
>>> array(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0',
'x1_60.0', 'x1_75.0'], dtype=object)
Is there a way to tell OneHotEncoder
to create the feature names in such a way that the column name is added at the beginning, something like - Sex_female, AgeGroup_15.0 etc, similar to what Pandas get_dummies()
does.
You can pass the list with original column names to get_feature_names
:
encoder.get_feature_names(['Sex', 'AgeGroup'])
will return:
['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75']
这篇关于OneHotEncoder 的功能名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!