问题描述
请考虑以下数据框:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame(data=[["France", "Italy", "Belgium"], ["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
df = df.apply(LabelEncoder().fit_transform)
print(df)
当前输出:
a b c
0 0 1 0
1 1 0 0
我的目标是通过传入要共享分类值的列来使其输出类似的内容:
My goal is to make it output something like this by passing in the columns I want to share categorial values:
a b c
0 0 1 2
1 1 0 2
推荐答案
通过 axis=1
为每一行调用一次LabelEncoder().fit_transform
.(默认情况下,df.apply(func)
为每一列调用一次func
.)
Pass axis=1
to call LabelEncoder().fit_transform
once for each row.(By default, df.apply(func)
calls func
once for each column).
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame(data=[["France", "Italy", "Belgium"],
["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
encoder = LabelEncoder()
df = df.apply(encoder.fit_transform, axis=1)
print(df)
收益
a b c
0 1 2 0
1 2 1 0
或者,您可以使用make category
dtype 并将类别代码用作标签:
Alternatively, you could use make the data of category
dtype and use the category codes as labels:
import pandas as pd
df = pd.DataFrame(data=[["France", "Italy", "Belgium"],
["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
stacked = df.stack().astype('category')
result = stacked.cat.codes.unstack()
print(result)
也产生
a b c
0 1 2 0
1 2 1 0
这应该明显更快,因为它不需要为每一行调用一次encoder.fit_transform
(如果您有很多行,这可能会带来糟糕的性能).
This should be significantly faster since it does not require calling encoder.fit_transform
once for each row (which might give terrible performance if you have lots of rows).
这篇关于标签编码具有相同类别的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!