This question already has answers here:
Running get_dummies on several DataFrame columns?
                                
                                    (4个答案)
                                
                        
                                23天前关闭。
            
                    
假设我们有一个雇员列表和一些其他数据:

  Employee   Location   Title
0        1  Location1  Title1
1        2  Location2  Title1
2        3  Location3  Title2
3        4  Location1  Title3
4        5  Location1  Title2


我将其转换为具有(1,0)值的要素和标签,它可以工作,但要在6k记录的数据库中使用会花费一些时间。逻辑:从“位置”取值,将其设置为一列,如果员工的“位置”与第1列匹配,则为0。

我的问题:是否可以通过某种方式优化性能?我缺乏术语,很难找到更好的解决方案,但是我认为应该有一些解决方案。

最终输出如下所示:

 Employee  Location1  Location2  Location3  Title1  Title2  Title3
0        1          1          0          0       1       0       0
1        2          0          1          0       1       0       0
2        3          0          0          1       0       1       0
3        4          1          0          0       0       0       1
4        5          1          0          0       0       1       0


需要花很长时间才能完成的工作代码:

import pandas as pd
df = pd.DataFrame.from_dict({'Employee': ['1','2','3','4','5'],
      'Location': ['Location1', 'Location2','Location3','Location1','Location1'],
      'Title': ['Title1','Title1','Title2','Title3','Title2']
     })
df_tr = df['Employee'] #temporary employee ids

# transposing the data, which takes ages:

df_newcols = {}
for column in list(df)[1:]:
    newcols = df[column].unique()
    for key in newcols:
        temp_ar = []
        for value in df[column]:
            if key == value:
                temp_ar.append(1)
            else:
                temp_ar.append(0)
        df_newcols[key] = temp_ar
print (df_newcols)

# adding transposed to the temp df

df_temp = pd.DataFrame.from_dict(df_newcols)

# merging with df with employee ids

new_df = pd.concat([df_tr,df_temp],axis=1)

最佳答案

这应该可以解决问题:

df["_dummy"]=1
df2=pd.concat([
    df.pivot_table(index="Employee", columns="Location", values="_dummy", aggfunc=max),
    df.pivot_table(index="Employee", columns="Title", values="_dummy", aggfunc=max)
], axis=1).fillna(0).astype(int).reset_index(drop=False)


参考:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

关于python - 优化性能以将值转换为0和1 ,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59779778/

10-12 17:53