这是包含3列和3行的数据集
名称组织部门
Manie ABC2财务
乔伊斯ABC1 HR
AMI NSV2 HR
这是我的代码:
现在一切都好了,我该如何删除每个虚拟变量列的第一个虚拟变量列呢?

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data1.csv',encoding = "cp1252")
X = dataset.values


# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

onehotencoder = OneHotEncoder(categorical_features = "all")
X = onehotencoder.fit_transform(X).toarray()

最佳答案

import pandas as pd
df = pd.DataFrame({'name': ['Manie', 'Joyce', 'Ami'],
                   'Org':  ['ABC2', 'ABC1', 'NSV2'],
                   'Dept': ['Finance', 'HR', 'HR']
        })


df_2 = pd.get_dummies(df,drop_first=True)

测试:
print(df_2)
   Dept_HR  Org_ABC2  Org_NSV2  name_Joyce  name_Manie
0        0         1         0           0           1
1        1         0         0           1           0
2        1         0         1           0           0

使用以下命令更新有关错误的信息:
根据documentation pagepd.get_dummies(X, columns =[1:]参数采用“列名”。因此,以下代码可以工作:
df_2 = pd.get_dummies(df, columns=['Org', 'Dept'], drop_first=True)

输出:
    name  Org_ABC2  Org_NSV2  Dept_HR
0  Manie         1         0        0
1  Joyce         0         0        1
2    Ami         0         1        1

如果确实要按位置定义列,可以这样做:
column_names_for_onehot = df.columns[1:]
df_2 = pd.get_dummies(df, columns=column_names_for_onehot, drop_first=True)

08-25 03:07