我试图从一个结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发和到达地址。此列表是根据自然语言中的句子生成的,因此它不遵循任何特定模板:

1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']

我已经为每种情况创建了一个非常相似的列表,除了出发和到达被实际的单词“出发”和“到达”替换。通过上面的例子,我得到:
1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']

现在我有了这两种名单,我想确定出发地和到达地:
1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure =  ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']

基本上,这两个列表中的参数都是不同的,但是我需要确定哪一个是出发的,哪一个是到达的。我想Regex可以帮我解决这个问题,但我不知道怎么做。
谢谢你的帮助!

最佳答案

我找到了一种方法来解决你的三个例子。你应该改变的是变量名,我不知道怎么命名。(这是一个古老的缓慢而难以理解的版本。以后的那个更好)

def extract_places(names, modes):
    keywords = set(modes).intersection(names)
    extracted = [[] for _ in modes]
    j = 0
    for i, mode in enumerate(modes):
        if mode.lower() in keywords:
            if mode.lower() != names[j].lower():
                while mode.lower() != names[j].lower():
                    extracted[i - 1].append(names[j])
                    j += 1
            else:
                extracted[i].append(names[j])
                j += 1
        else:
            if names[j].lower() not in keywords:
                while j < len(names) and names[j].lower() not in keywords:
                    extracted[i].append(names[j])
                    j += 1

    extracted = dict(zip(modes, extracted))
    return extracted["arrival"], extracted["departure"]

我找到了另一种方法,也许更容易理解。但是这种方法比第一种方法快十倍,所以你可能想用它。
def partition(l, word): # Helper to split a list or tuple at an specific element
    i = l.index(word)
    return l[:i], l[i + 1:]

def extract_places(names, modes):
    keywords = set(modes).intersection(names)
    mapped = [(modes, names)]
    for word in keywords:
        new_mapped = []
        for mode,name in mapped:
            if word in mode:
                m1, m2 = partition(mode, word)
                n1, n2 = partition(name, word)
                if m1:
                    new_mapped.append((m1, n1))
                if m2:
                    new_mapped.append((m2, n2))
            else:
                new_mapped.append((mode,name))
        mapped = new_mapped
    mapped = {m[0]: n for m, n in mapped}
    return mapped['arrival'], mapped['departure']

两种方式的作用完全相同:
for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
                 ['go', 'arrival', 'from', 'departure']
                 ),
                (['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
                 ['How', 'go', 'arrival', 'from', 'departure']
                 ),
                (['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
                 ['go', 'from', 'departure', 'to', 'arrival']
                 )):
    print(extract_places(*example))

两种打印:
(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])

关于python - 从列表中提取出发和到达,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50230160/

10-11 06:20