python - 如何使用 re.sub 向 python 中的某些字符串添加标签？

我正在尝试向某些给定的查询字符串添加标签，并且标签应该环绕所有匹配的字符串。
例如，我想在句子 iphone games mac 中匹配查询 I love downloading iPhone games from my mac. 的所有单词周围环绕标签应该是 I love downloading iPhone games from my mac.
目前，我试过

sentence = "I love downloading iPhone games from my mac."
query = r'((iphone|games|mac)\s*)+'
regex = re.compile(query, re.I)
sentence = regex.sub(r'<em>\1</em> ', sentence)

语句输出

I love downloading <em>games </em> on my <em>mac</em> !

其中\1 仅替换为一个单词( games 而不是 iPhone games )，并且单词后面有一些不必要的空格。如何编写正则表达式以获得所需的输出？谢谢!

编辑:
我刚刚意识到，当我字里行间时，Fred 和 Chris 的解决方案都有问题。例如，如果我的查询是 game ，那么它会变成 games 而我不希望它被突出显示。另一个例子是 the 中的 either 不应突出显示。

编辑 2:
我采用了 Chris 的新解决方案，并且有效。

最佳答案

首先，要根据需要获取空格，请将 \s* 替换为 \s*? 以使其不贪婪。

第一个修复:

>>> re.compile(r'(((iphone|games|mac)\s*?)+)', re.I).sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone</em> <em>games</em> from my <em>mac</em>.'

不幸的是，一旦 \s* 非贪婪，它就会拆分短语，如您所见。没有它，它是这样的，将两者组合在一起:

>>> re.compile(r'(((iPhone|games|mac)\s*)+)').sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games </em>from my <em>mac</em>.'

我还想不出如何解决这个问题。

另请注意，在这些中，我在 + 周围添加了一组额外的括号，以便捕获所有匹配项 - 这就是区别。

进一步更新:实际上，我可以想出一种方法来解决它。你决定你是否想要那样。

>>> regex = re.compile(r'((iphone|games|mac)(\s*(iphone|games|mac))*)', re.I)
>>> regex.sub(r'<em>\1</em>', sentence)
'I love downloading <em>iPhone games</em> from my <em>mac</em>.'

更新: 考虑到您关于词边界的观点，我们只需要添加 \b 的几个实例，词边界匹配器。

>>> regex = re.compile(r'(\b(iphone|games|mac)\b(\s*(iphone|games|mac)\b)*)', re.I)
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone games from my mac')
'I love downloading <em>iPhone games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone gameses from my mac')
'I love downloading <em>iPhone</em> gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney games from my mac')
'I love downloading iPhoney <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhoney gameses from my mac')
'I love downloading iPhoney gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone gameses from my mac')
'I love downloading miPhone gameses from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading miPhone games from my mac')
'I love downloading miPhone <em>games</em> from my <em>mac</em>'
>>> regex.sub(r'<em>\1</em>', 'I love downloading iPhone igames from my mac')
'I love downloading <em>iPhone</em> igames from my <em>mac</em>'

关于python - 如何使用 re.sub 向 python 中的某些字符串添加标签？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/4221509/