我有这段代码,应该根据定义的语法显示句子的句法结构。但是,它返回一个空的[]。我想念什么或做错什么?

import nltk

grammar = nltk.parse_cfg("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP
VP -> V NP | VP PP
N -> 'Kim' | 'Dana' | 'everyone'
V -> 'arrived' | 'left' |'cheered'
P -> 'or' | 'and'
""")

def main():
    sent = "Kim arrived or Dana left and everyone cheered".split()
    parser = nltk.ChartParser(grammar)
    trees = parser.nbest_parse(sent)
    for tree in trees:
        print tree

if __name__ == '__main__':
    main()

最佳答案

让我们做一些逆向工程:

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

似乎规则甚至不能将第一个作品都识别为NP。因此,让我们尝试注入(inject)NP -> N
>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP | N
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('NP', [Tree('N', ['Kim'])])]

现在,它开始工作了,让我们继续Kim arrived or Dana and:
>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | VP PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>>
>>> sent = "Kim arrived or".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

似乎无法获取带有或不带有VPP,因为V之后需要NP,或者在获取VP之前必须先走到树上成为P,因此放宽规则并改为使用VP -> V PP即可的VP -> VP PP:
>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('S', [Tree('NP', [Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['arrived']), Tree('PP', [Tree('P', ['or']), Tree('NP', [Tree('N', ['Dana'])])])])])]

好的,我们越来越近了,但是接下来的单词似乎又打破了cfg规则:
>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana left".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> sent = "Kim arrived or Dana left and".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>>
>>> sent = "Kim arrived or Dana left and everyone".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>>
>>> sent = "Kim arrived or Dana left and everyone cheered".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

因此,我希望上面的示例向您展示,尝试更改规则以从左到右合并语言现象很困难。

而不是从左到右,实现
[[[[[[[[Kim] arrived] or] Dana] left] and] everyone] cheered]

您为什么不尝试制定更合理的语言规则来实现:
  • [[[Kim arrived] or [Dana left]] and [everyone cheered]]
  • [[Kim arrived] or [[Dana left] and [everyone cheered]]]

  • 尝试以下方法:
    import nltk
    grammar = nltk.parse_cfg("""
    S -> CP | VP
    CP -> VP C VP | CP C VP | VP C CP
    VP -> NP V
    NP -> 'Kim' | 'Dana' | 'everyone'
    V -> 'arrived' | 'left' |'cheered'
    C -> 'or' | 'and'
    """)
    
    print "======= Kim arrived ========="
    sent = "Kim arrived".split()
    parser = nltk.ChartParser(grammar)
    for t in parser.nbest_parse(sent):
        print t
    
    print "\n======= Kim arrived or Dana left ========="
    sent = "Kim arrived or Dana left".split()
    parser = nltk.ChartParser(grammar)
    for t in parser.nbest_parse(sent):
        print t
    
    print "\n=== Kim arrived or Dana left and everyone cheered ===="
    sent = "Kim arrived or Dana left and everyone cheered".split()
    parser = nltk.ChartParser(grammar)
    for t in parser.nbest_parse(sent):
        print t
    

    [out] :
    ======= Kim arrived =========
    (S (VP (NP Kim) (V arrived)))
    
    ======= Kim arrived or Dana left =========
    (S (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left))))
    
    === Kim arrived or Dana left and everyone cheered ====
    (S
      (CP
        (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left)))
        (C and)
        (VP (NP everyone) (V cheered))))
    (S
      (CP
        (VP (NP Kim) (V arrived))
        (C or)
        (CP
          (VP (NP Dana) (V left))
          (C and)
          (VP (NP everyone) (V cheered)))))
    

    上面的解决方案显示了CFG规则如何足够健壮,不仅要捕获完整的句子,还要捕获句子的一部分。

    关于python-2.7 - Python和NLTK : How to analyze sentence grammar?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20983494/

    10-12 23:13