python - 返回pandas列中存在的多个单词的计数

我有一个pandas数据框，如下所示，列名为“texts”

texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three

我想计算每行有三个单词‘1’、‘2’、‘3’，如果是一个完整的单词，则返回这些单词的匹配数。输出如下所示。

    texts   counts
    throne one  1
    bar one     1
    foo two     1
    bar three   1
    foo two     1
    bar two     1
    foo one     1
    foo three   1
    one three   2

您可以看到，第一行的count是1，因为没有将“porth”视为正在搜索的值之一“one”不是一个完整的单词，而是“porth”。
有什么帮助吗？

最佳答案

通过将pd.Series.str.count与words结合使用'|'与regex

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(words)))

        texts  counts
0  throne one       2
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

为了确定'throne'，我们可以在regex中添加一些单词边界

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

对于flair，在Python 3.6中使用f字符串的原始形式

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

关于python - 返回pandas列中存在的多个单词的计数，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/49676597/