问题描述
我试图找到一个字在字符串中出现的次数。
word =dog
str1 =狗吠
我使用以下方法来计算出现次数:
count = str1.count(word)
问题是我想要一个完全匹配。所以这句话的计数为0.
是可能的吗?解决方案:
import re
count = sum(1 for _ in re.finditer(r'\% \ b'%re.escape(word),input_string))
创建任何中间列表(不像
split()
),因此对于大的input_string
它也有正确工作标点符号的好处 - 它会正确返回
1
作为短语Mike看到一只狗。
(而无参数split()
不会)。它使用\b
regex标志,它匹配字边界(\w
aka[a-zA-Z0-9 _]
和任何其他)。
如果您需要担心ASCII字符以外的语言设置,您可能需要调整正则表达式以正确匹配这些语言中的非字符字符,但对于许多应用程序,这将是一个过于复杂,在许多其他情况下,设置正则表达式的Unicode和/或区域设置标志就足够了。 / p>
I'm trying to find the number of occurrences of a word in a string.
word = "dog" str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.Is that possible?
解决方案If you're going for efficiency:
import re count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike
split()
) and thus will work efficiently for largeinput_string
values.It also has the benefit of working correctly with punctuation - it will properly return
1
as the count for the phrase"Mike saw a dog."
(whereas an argumentlesssplit()
would not). It uses the\b
regex flag, which matches on word boundaries (transitions between\w
a.k.a[a-zA-Z0-9_]
and anything else).If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.
这篇关于在python 3中查找字符串中出现的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!