我有这个数据集:

ARTID    INFO_1         INFO_2
00001   some_info_11   some_info_21
00002   some_info_12   some_info_22
00003   some_info_13   some_info_23


我想这样转变

ARTID    some_info_11  some_info_12   some_info_13   some_info_21   some_info_22 some_info_23
00001      1                 0           0              1                0             0
00002      0                 1           0              0                1             0


但我需要成为一个稀疏矩阵。什么是最“内存友好”的方法?

最佳答案

使用pd.get_dummies()pd.concat()

df1 = pd.concat([df.ARTID,pd.get_dummies(df[['INFO_1','INFO_2']],prefix='',prefix_sep='')],axis=1)

print(df1)
  ARTID  some_info_11  some_info_12  some_info_13  some_info_21  \
0 00001             1             0             0             1
1 00002             0             1             0             0
2 00003             0             0             1             0

   some_info_22  some_info_23
0             0             0
1             1             0
2             0             1


如果允许ARTID作为索引,则可以使用:

pd.get_dummies(df[['INFO_1','INFO_2']],prefix='',prefix_sep='').set_index(df.ARTID)

             some_info_11  some_info_12  some_info_13  some_info_21  some_info_22  \
ARTID
00001                 1             0             0             1             0
00002                 0             1             0             0             1
00003                 0             0             1             0             0

          some_info_23
ARTID
00001                 0
00002                 0
00003                 1

关于python - 将pandas df从长到宽转换为稀疏矩阵,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54197329/

10-12 23:04