python - 回归中的有序分类数据

我有一个包含具有不同权重的分类数据的数据集，例如，实例Phd的权重高于Masters，而MSc则高于Bsc。

我知道我要使用标签编码器，但我不希望python将代码任意分配给这些变量。我想为Phd = 4，Msc = 3，Bsc = 2，O Levels = 1和No Education = 0设置更高的代码。

无论如何，我可以解决这个问题吗？谁能帮忙？

最佳答案

LabelEncoder将根据字母顺序对类别进行编码，并存储在classes_属性中。默认情况下是这种情况：

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(['Phd', 'Msc','Bsc', 'O Levels','No education'])
ll.classes_
# Output: array(['Bsc', 'Msc', 'No education', 'O Levels', 'Phd'], dtype='|S12')

有几类？如果更少，您可以自己使用类似于this answer here的字典进行转换：

my_dict = {'Phd':4, 'Msc':3 , 'Bsc':2, 'O Levels':1, 'No education':0}

y = ['No education', 'O Levels','Bsc', 'Msc','Phd']
np.vectorize(my_dict.get)(y)

# Output: array([0, 1, 2, 3, 4])

关于python - 回归中的有序分类数据，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/49854070/

phd

python - 回归中的有序分类数据