python - 分割的正则表达式-将单词分解为词素或词缀

我试着把一个单词分割成词缀和前缀（即语素或词缀）等成分，然后得到一个列表。
我试过使用正则表达式，使用re.findall函数。
（如下所示）

>>> import re
>>> affixes = ['meth','eth','ketone', 'di', 'chloro', 'yl', 'ol']
>>> word = 'dimethylamin0ethanol'
>>> re.findall('|'.join(affixes), word)

['di', 'meth', 'yl', 'eth', 'ol']

但是，我需要包含它不匹配的部分。例如，需要输出上面的示例：
['di', 'meth', 'yl', 'amin0', 'eth', 'an', 'ol']
有人知道如何从列表中提取这些片段吗？

最佳答案

您可以使用re.split()捕获“分隔符”：

In [1]: import re

In [2]: affixes = ['meth', 'eth', 'ketone', 'di', 'chloro', 'yl', 'ol']

In [3]: word = 'dimethylamin0ethanol'

In [4]: [match for match in re.split('(' + '|'.join(affixes) + ')', word) if match]
Out[4]: ['di', 'meth', 'yl', 'amin0', 'eth', 'an', 'ol']

这里的列表理解是过滤空字符串匹配项。

关于python - 分割的正则表达式-将单词分解为词素或词缀，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/40988123/