我有一个csv文件,它像这样:

index,labels
1,created the tower
2,destroyed the tower
3,created the swimming pool
4,destroyed the swimming pool

现在,如果我传递要代替标签列的列列表(不包含标签列中的所有单词)
['created','tower','destroyed','swimming pool']

我想获取数据框为:
index,created,destroyed,tower,swimming pool
1,1,0,1,0
2,0,1,1,0
3,1,0,0,1
4,0,1,0,1

我调查了get_dummies,但这并没有帮助

最佳答案

您可以循环调用str.contains

print(df)

                        labels
0            created the tower
1          destroyed the tower
2    created the swimming pool
3  destroyed the swimming pool

req = ['created', 'destroyed', 'tower', 'swimming pool']

out = pd.concat([df['labels'].str.contains(x) for x in req], 1, keys=req).astype(int)
print(out)

   created  destroyed  tower  swimming pool
0        1          0      1              0
1        0          1      1              0
2        1          0      0              1
3        0          1      0              1

关于python - Pandas 从单个字符串列生成列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46046092/

10-11 01:29