我有这样的字符串,
my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").'
现在,我想使用关键字
champion
和underdog
提取当前的champion
和underdog
。这里真正具有挑战性的是两个竞争者的名称都出现在括号内的关键字之前。我想使用正则表达式并提取信息。
以下是我所做的
champion = re.findall(r'("champion"[^.]*.)', my_str)
print(champion)
>> ['"champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").']
underdog = re.findall(r'("underdog"[^.]*.)', my_str)
print(underdog)
>>['"underdog").']
但是,我需要结果,
champion as
:brooklyn centenniel, resident of detroit, michigan
underdog
为:kamil kubaru, the challenger from alexandria, virginia
如何使用正则表达式执行此操作? (我一直在搜索,如果我能从关键字中返回几个单词或单词来获得所需的结果,但还没有运气)任何帮助或建议,我们将不胜感激。
最佳答案
您可以使用命名的捕获组来捕获所需的结果:
between\s+(?P<champion>.*?)\s+\("champion"\)\s+and\s+(?P<underdog>.*?)\s+\("underdog"\)
between\s+(?P<champion>.*?)\s+\("champion"\)
将between
到("champion")
的块匹配,并将所需的部分作为命名的捕获组champion
之后,
\s+and\s+(?P<underdog>.*?)\s+\("underdog"\)
将块匹配到("underdog")
,然后再次从此处获取所需部分,作为命名捕获组underdog
例:
In [26]: my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia
...: ("underdog").'
In [27]: out = re.search(r'between\s+(?P<champion>.*?)\s+\("champion"\)\s+and\s+(?P<underdog>.*?)\s+\("underdog"\)', my_str)
In [28]: out.groupdict()
Out[28]:
{'champion': 'brooklyn centenniel, resident of detroit, michigan',
'underdog': 'kamil kubaru, the challenger from alexandria, virginia'}
关于python - 从字符串中提取出现在关键字之前的单词/句子-Python,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48953985/