我有一个这样的熊猫数据框:
id comment
1 its not proper
2 improvement needed
3 organization is proper
4 registration not done
5 timelines not proper
对于这组单词[“proper”、“organization”、“done”]我想计算它们出现的id数。所以输出应该是:
proper 3
organization 1
done 1
我尝试过使用for循环:
word_list = ['proper','organization','done']
final _list = {'proper':0,'organization':0,'done':0}
for index,row in data.iterrows():
for word in word_list:
if word in row['comment'].split(' '):
final_list[word] += 1
有没有办法不用for循环就可以做到这一点。。。
最佳答案
在str.contains
的列表理解中,可以使用words
和bool值之和。
In [23]: words = ['proper','organization','done']
In [24]: pd.DataFrame([[wrd, df['comment'].str.contains(wrd).sum()] for wrd in words])
Out[24]:
0 1
0 proper 3
1 organization 1
2 done 1