我有一个这样的熊猫数据框:

id   comment

1    its not proper
2    improvement needed
3    organization is proper
4    registration not done
5    timelines not proper

对于这组单词[“proper”、“organization”、“done”]我想计算它们出现的id数。所以输出应该是:
proper         3
organization   1
done           1

我尝试过使用for循环:
word_list = ['proper','organization','done']
final _list = {'proper':0,'organization':0,'done':0}
for index,row in data.iterrows():
    for word in word_list:
        if word in row['comment'].split(' '):
            final_list[word] += 1

有没有办法不用for循环就可以做到这一点。。。

最佳答案

str.contains的列表理解中,可以使用words和bool值之和。

In [23]: words = ['proper','organization','done']

In [24]: pd.DataFrame([[wrd, df['comment'].str.contains(wrd).sum()] for wrd in words])
Out[24]:
              0  1
0        proper  3
1  organization  1
2          done  1

10-05 17:50