python - 将名字或姓氏上的名字与国际字符匹配

我试图通过假设名字采用Firstname Lastlame形式来捕获名字。这对下面的代码很好用，但是我希望能够捕获Pär Åberg这样的国际名称。我找到了一些解决方案，但不幸的是它们似乎不适用于Python风格的正则表达式。对此有任何感想的人吗？

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re

text = """
This is a text containing names of people in the text such as
Hillary Clinton or Barack Obama. My problem is with names that uses stuff
outside A-Z like Swedish names such as Pär Åberg."""

for name in re.findall("(([A-Z])[\w-]*(\s+[A-Z][\w-]*)+)", text):
    firstname = name[0].split()[0]
    print firstname

最佳答案

您需要一个替代的regex library，因为您可以在其中使用\p{L}-任何Unicode字母。

然后，使用

ur'\p{Lu}[\w-]*(?:\s+\p{Lu}[\w-]*)+'

使用Unicode字符串初始化正则表达式时，会自动使用UNICODE标志:

关于python - 将名字或姓氏上的名字与国际字符匹配，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/33739909/