本文介绍了 pandas 中的条件词频数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个如下数据框:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
我想计算语音列中的单词数,但只针对预定义列表中的单词数.例如,列表为:
I want to count the number of words in the speech column but only for the words from a pre-defined list. For example, the list is:
wordlist = ['much', 'good','right']
我想生成一个新列,以显示每行中这三个单词的出现频率.因此,我的预期输出是:
I want to generate a new column which shows the frequency of these three words in each row. My expected output is therefore:
speaker speech words
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have received a copy of our ... 1
我尝试过:
df['total'] = 0
for word in df['speech'].str.split():
if word in wordlist:
df['total'] += 1
但是运行它后,total
列始终为零.我想知道我的代码有什么问题吗?
But I after running it, the total
column is always zero. I am wondering what's wrong with my code?
推荐答案
您可以使用以下矢量化方法:
You could use the following vectorised approach:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
wordlist = ['much', 'good','right']
df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))
哪个给:
>>> df
speaker speech total
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have some good rest 1
这篇关于 pandas 中的条件词频数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!