本文介绍了 pandas 中的条件词频数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框:

data = {'speaker':['Adam','Ben','Clair'],
        'speech': ['Thank you very much and good afternoon.',
                   'Let me clarify that because I want to make sure we have got everything right',
                   'By now you should have some good rest']}
df = pd.DataFrame(data)

我想计算语音列中的单词数,但只针对预定义列表中的单词数.例如,列表为:

I want to count the number of words in the speech column but only for the words from a pre-defined list. For example, the list is:

wordlist = ['much', 'good','right']

我想生成一个新列,以显示每行中这三个单词的出现频率.因此,我的预期输出是:

I want to generate a new column which shows the frequency of these three words in each row. My expected output is therefore:

     speaker                   speech                               words
0   Adam          Thank you very much and good afternoon.             2
1   Ben        Let me clarify that because I want to make sur...      1
2   Clair        By now you should have received a copy of our ...    1

我尝试过:

df['total'] = 0
for word in df['speech'].str.split():
    if word in wordlist:
        df['total'] += 1

但是运行它后,total列始终为零.我想知道我的代码有什么问题吗?

But I after running it, the total column is always zero. I am wondering what's wrong with my code?

推荐答案

您可以使用以下矢量化方法:

You could use the following vectorised approach:

data = {'speaker':['Adam','Ben','Clair'],
        'speech': ['Thank you very much and good afternoon.',
                   'Let me clarify that because I want to make sure we have got everything right',
                   'By now you should have some good rest']}
df = pd.DataFrame(data)

wordlist = ['much', 'good','right']

df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))

哪个给:

>>> df
  speaker                                             speech  total
0    Adam            Thank you very much and good afternoon.      2
1     Ben  Let me clarify that because I want to make sur...      1
2   Clair              By now you should have some good rest      1

这篇关于 pandas 中的条件词频数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 11:10