python - python:注释的拆分句子

清单：

matrixA = []
matrixB = []

句子：

sentences 1 = "words1 words2 words3 {matrixA} {matrixB}"
sentences 2 = "words3 words4  {matrixA}"
etc..

结果：

matrixA = "words1 words2 words3", "words3 words4"
matrixB = "words1 words2 words3"
etc..

任何想法，图书馆的支持？
导入re，nltk或？
可以手动完成，但是如果我使用图书馆，我想会更快。

最佳答案

首先，如果您有很多句子，将其放在list内将是明智的：

sentences = ["words1 words2 words3 {matrixA} {matrixB}", "words3 words4  {matrixA}"]

接下来，对于各种变量名，例如Matrix*，我建议使用defaultdict包中列表的collections。

from collections import defaultdict
matrices = defaultdict(list)

现在，出现循环。要获取每个句子中的名称列表，请使用re.findall。然后，对于找到的每个变量名称，将单词附加到matrices中该变量名称的列表中。

import re

for s in sentences:
    for m in re.findall("{(.*?)}", s):
        matrices[m].append(s.split('{', 1)[0])

print(dict(matrices))
{
    "matrixA": [
        "words1 words2 words3 ",
        "words3 words4  "
    ],
    "matrixB": [
        "words1 words2 words3 "
    ]
}

这似乎是您要找的东西。

如果不想尾随空格，请附加s.split('{', 1)[0].strip()，调用str.strip除去前导/尾随空格字符。

关于python - python:注释的拆分句子，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/48247844/