问题描述
我目前使用re.findall查找并隔离字符串中哈希标签的'#'字符后的单词:
I currently use re.findall to find and isolate words after the '#' character for hash tags in a string:
hashtags = re.findall(r'#([A-Za-z0-9_]+)', str1)
它搜索str1并找到所有的标签。此方法有效,但是它不考虑以下重音字符,例如:áéíóúñü¿
。
It searches str1 and finds all the hashtags. This works however it doesn't account for accented characters like these for example: áéíóúñü¿
.
如果其中之一这些字母在str1中,它将保存#号直到其前的字母。因此,例如,#yogenfrüz
将是 #yogenfr
。
If one of these letters are in str1, it will save the hashtag up until the letter before it. So for example, #yogenfrüz
would be #yogenfr
.
我需要能够解释所有带重音符号的字母,包括德语,荷兰语,法语和西班牙语,以便保存诸如#这样的标签。 yogenfrüz
I need to be able to account for all accented letters that range from German, Dutch, French and Spanish so that I can save hashtags like #yogenfrüz
我该怎么做
推荐答案
尝试以下操作:
hashtags = re.findall(r'#(\w+)', str1, re.UNICODE)
编辑
查看下面来自Martijn Pieters的有用评论
EDITCheck the useful comment below from Martijn Pieters.
这篇关于如何在Python中为正则表达式解释重音字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!