python - 优化性能以将值转换为0和1

This question already has answers here:

Running get_dummies on several DataFrame columns?

                                    （4个答案）


                                23天前关闭。


假设我们有一个雇员列表和一些其他数据：

  Employee   Location   Title
0        1  Location1  Title1
1        2  Location2  Title1
2        3  Location3  Title2
3        4  Location1  Title3
4        5  Location1  Title2

我将其转换为具有（1,0）值的要素和标签，它可以工作，但要在6k记录的数据库中使用会花费一些时间。逻辑：从“位置”取值，将其设置为一列，如果员工的“位置”与第1列匹配，则为0。

我的问题：是否可以通过某种方式优化性能？我缺乏术语，很难找到更好的解决方案，但是我认为应该有一些解决方案。

最终输出如下所示：

 Employee  Location1  Location2  Location3  Title1  Title2  Title3
0        1          1          0          0       1       0       0
1        2          0          1          0       1       0       0
2        3          0          0          1       0       1       0
3        4          1          0          0       0       0       1
4        5          1          0          0       0       1       0

需要花很长时间才能完成的工作代码：

import pandas as pd
df = pd.DataFrame.from_dict({'Employee': ['1','2','3','4','5'],
      'Location': ['Location1', 'Location2','Location3','Location1','Location1'],
      'Title': ['Title1','Title1','Title2','Title3','Title2']
     })
df_tr = df['Employee'] #temporary employee ids

# transposing the data, which takes ages:

df_newcols = {}
for column in list(df)[1:]:
    newcols = df[column].unique()
    for key in newcols:
        temp_ar = []
        for value in df[column]:
            if key == value:
                temp_ar.append(1)
            else:
                temp_ar.append(0)
        df_newcols[key] = temp_ar
print (df_newcols)

# adding transposed to the temp df

df_temp = pd.DataFrame.from_dict(df_newcols)

# merging with df with employee ids

new_df = pd.concat([df_tr,df_temp],axis=1)

最佳答案

这应该可以解决问题：

df["_dummy"]=1
df2=pd.concat([
    df.pivot_table(index="Employee", columns="Location", values="_dummy", aggfunc=max),
    df.pivot_table(index="Employee", columns="Title", values="_dummy", aggfunc=max)
], axis=1).fillna(0).astype(int).reset_index(drop=False)

参考：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

关于python - 优化性能以将值转换为0和1 ，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59779778/