我有一个pandas数据框,如下所示,列名为“texts”
texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three
我想计算每行有三个单词‘1’、‘2’、‘3’,如果是一个完整的单词,则返回这些单词的匹配数。输出如下所示。
texts counts
throne one 1
bar one 1
foo two 1
bar three 1
foo two 1
bar two 1
foo one 1
foo three 1
one three 2
您可以看到,第一行的count是1,因为没有将“porth”视为正在搜索的值之一“one”不是一个完整的单词,而是“porth”。
有什么帮助吗?
最佳答案
通过将pd.Series.str.count
与words
结合使用'|'
与regex
words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(words)))
texts counts
0 throne one 2
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
为了确定
'throne'
,我们可以在regex中添加一些单词边界words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))
texts counts
0 throne one 1
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
对于flair,在Python 3.6中使用f字符串的原始形式
words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))
texts counts
0 throne one 1
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
关于python - 返回pandas列中存在的多个单词的计数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49676597/