我有一个csv文件,它像这样:
index,labels
1,created the tower
2,destroyed the tower
3,created the swimming pool
4,destroyed the swimming pool
现在,如果我传递要代替标签列的列列表(不包含标签列中的所有单词)
['created','tower','destroyed','swimming pool']
我想获取数据框为:
index,created,destroyed,tower,swimming pool
1,1,0,1,0
2,0,1,1,0
3,1,0,0,1
4,0,1,0,1
我调查了get_dummies,但这并没有帮助
最佳答案
您可以循环调用str.contains
。
print(df)
labels
0 created the tower
1 destroyed the tower
2 created the swimming pool
3 destroyed the swimming pool
req = ['created', 'destroyed', 'tower', 'swimming pool']
out = pd.concat([df['labels'].str.contains(x) for x in req], 1, keys=req).astype(int)
print(out)
created destroyed tower swimming pool
0 1 0 1 0
1 0 1 1 0
2 1 0 0 1
3 0 1 0 1
关于python - Pandas 从单个字符串列生成列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46046092/