Python中具有特征结构的上下文无关文法

(我认为这不是功能缺陷，所以向NLTK存储库提出问题将是件好事)因此，如果执行此操作，它将打印可能的终端的所有组合(无需关心协议):from nltk import grammar, parsefrom nltk.parse.generate import generate# If person is always 3rd, we can skip the PERSON feature.g = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)print(list(generate(grammar, n=30))) [输出]:[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]但是，如果我们尝试解析有效和无效的句子，则协议规则会生效:from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)trees = parser.parse('une garcon'.split()) # Invalid sentence.print ("Parses for 'une garcon':", list(trees))trees = parser.parse('un garcon'.split()) # Valid sentence.print ("Parses for 'un garcon':", list(trees)) [输出]:Parses for 'une garcon': []Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]要在生成时达成协议规则，一种可能的解决方案是解析每个生成的产品并保留可解析的产品，例如from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)for tokens in list(generate(grammar, n=30)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. print(' '.join(first_parse.leaves())) except StopIteration: continue [输出]:un garconune fille我想目标是产生以下内容的最后第二列:没有介词from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'D[AGR=[NUM='sg', GND='m']] -> 'le'D[AGR=[NUM='sg', GND='f']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)valid_productions = set()for tokens in list(generate(grammar, n=30)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np) [输出]:la fillele garconles fillesles garconsun garconune fille现在要包括介词语法的TOP(又称START)必须有多个分支，当前DP -> D[AGR=?a] N[AGR=?a]规则位于TOP处，以允许PP构造，我们必须使用PHRASE -> DP | PP和将PHRASE非终结符设置为新的TOP，例如from nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=?a] N[AGR=?a]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'D[AGR=[NUM='sg', GND='m']] -> 'le'D[AGR=[NUM='sg', GND='f']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np) [输出]:au garcondu garconla fillele garconles fillesles garconsun garconune fille获取表格中的所有内容:from nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'P[AGR=[NUM='pl']] -> 'des' | 'aux'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100000)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np) [输出]:au garconaux fillesaux garconsde la filledes fillesdes garconsdu garconla fillele garconles fillesles garconsun garconune filleà la fille桌子后面也可以生成de|a un(e) garcon|fille，即 de un garcon de une fille一个加农船一个圆角但是我不确定它们是否是有效的法语短语，但是如果是，则可以不指定女性单数PP规则并删除DEF功能:PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]收件人:PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]，然后添加一条附加规则以产生雄奇异的不确定PP PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']] TL; DRfrom nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'P[AGR=[NUM='pl']] -> 'des' | 'aux'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100000)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np) [输出]:au garconaux fillesaux garconsde la fillede un garconde une filledes fillesdes garconsdu garconla fillele garconles fillesles garconsun garconune filleà la filleà un garconà une fille Am trying to generate sentences from a defined grammar with python, to avoid agreement problem I used feature structures,This is the code I have done so far:>>> from __future__ import print_function >>> import nltk >>> from nltk.featstruct import FeatStruct >>> from nltk import grammar, parse >>> from nltk.parse.generate import generate >>> from nltk import CFG >>> g = """ % start DP DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']] D[AGR=[NUM='sg', PERS=3, GND='f']] -> 'une' | 'la' D[AGR=[NUM='sg', PERS=3, GND='m']] -> 'un' | 'le' D[AGR=[NUM='pl', PERS=3]] -> 'des' | 'les' N[AGR=[NUM='sg', GND='m']] -> 'garçon' N[AGR=[NUM='pl', GND='m']] -> 'garçons' N[AGR=[NUM='sg', GND='f']] -> 'fille' N[AGR=[NUM='pl', GND='f']] -> 'filles' """ >>> for sentence in generate(grammar, n=30): print(''.join(sentence))This is the output am getting:unegarçonunegarçonsunefilleunefilleslagarçonlagarçonslafillelafillesungarçonungarçonsunfilleunfilleslegarçonlegarçonslefillelefillesdesgarçondesgarçonsdesfilledesfilleslesgarçonlesgarçonslesfillelesfillesWhile am supposed to have an output like this:un garçonle garçonThe problems I have are:The agreement is not working out, am having sentences that does not respect the agreementThere is no space between the two words in the sentence.What is that I can't see? 解决方案 Lets solve the easy part of the question first.Q2. There is no space between the two words in the sentence.You're close when it comes to the printing =)The problem lies in how you're using the str.join function. >>> list_of_str = ['a', 'b', 'c']>>> ''.join(list_of_str)'abc'>>> ' '.join(list_of_str)'a b c'>>> '|'.join(list_of_str)'a|b|c'Q1. The agreement is not working out, am having sentences that does not respect the agreementFirst warning signTo produce feature structure grammar with agreement, there should be a rule that contains something like D[AGR=?a] N[AGR=?a] on the right-hand-side (RHS), e.g.NP -> D[AGR=?a] N[AGR=?a]With that missing there's no real agreement rule in the grammar, see http://www.nltk.org/howto/featgram.htmlNow comes the gotcha!If we look at the nltk.parse.generate code carefully, it's merely yielding all possible combinations of the terminals and it seems like it's not caring about the feature structures: https://github.com/nltk/nltk/blob/develop/nltk/parse/generate.py(I think that's a bug not a feature so raising an issue to the NLTK repository would be good)So if we do this, it'll print all combinations of possible terminals (without caring for the agreement):from nltk import grammar, parsefrom nltk.parse.generate import generate# If person is always 3rd, we can skip the PERSON feature.g = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)print(list(generate(grammar, n=30)))[out]:[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]But if we try to parse valid and invalid sentences, the agreement rule kicks in:from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)trees = parser.parse('une garcon'.split()) # Invalid sentence.print ("Parses for 'une garcon':", list(trees))trees = parser.parse('un garcon'.split()) # Valid sentence.print ("Parses for 'un garcon':", list(trees))[out]:Parses for 'une garcon': []Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]To achieve the agreement rule at generation, one possible solution would be to parse each generated production and keep the parse-able ones, e.g.from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)for tokens in list(generate(grammar, n=30)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. print(' '.join(first_parse.leaves())) except StopIteration: continue[out]:un garconune filleI guess goal is to produce the last 2nd column of:Without the prepositions:from nltk import grammar, parsefrom nltk.parse.generate import generateg = """DP -> D[AGR=?a] N[AGR=?a]N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'D[AGR=[NUM='sg', GND='m']] -> 'le'D[AGR=[NUM='sg', GND='f']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(grammar)valid_productions = set()for tokens in list(generate(grammar, n=30)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np)[out]:la fillele garconles fillesles garconsun garconune filleNow to include the prepositionThe TOP (aka START) of the grammar has to have more than one branch, currently the DP -> D[AGR=?a] N[AGR=?a] rule is at the TOP, to allow for a PP construction, we've to something like PHRASE -> DP | PP and make the PHRASE non-terminal the new TOP, e.g.from nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=?a] N[AGR=?a]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m']] -> 'un'D[AGR=[NUM='sg', GND='f']] -> 'une'D[AGR=[NUM='sg', GND='m']] -> 'le'D[AGR=[NUM='sg', GND='f']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np)[out]:au garcondu garconla fillele garconles fillesles garconsun garconune filleTo get everything in the table:from nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'P[AGR=[NUM='pl']] -> 'des' | 'aux'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100000)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np)[out]:au garconaux fillesaux garconsde la filledes fillesdes garconsdu garconla fillele garconles fillesles garconsun garconune filleà la filleBeyond the tableIt's also possible to produce de|a un(e) garcon|fille, i.e.de un garconde une fillea un garcona une filleBut I'm not sure whether they're valid French phrases, but if they are you can underspecify the feminin singular PP rule and remove the DEF feature:PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]to:PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]and then add an additional rule to produce male singular indefinite PPPP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]TL;DRfrom nltk import grammar, parsefrom nltk.parse.generate import generateg = """PHRASE -> DP | PPDP -> D[AGR=?a] N[AGR=?a]PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'P[AGR=[NUM='pl']] -> 'des' | 'aux'N[AGR=[NUM='sg', GND='m']] -> 'garcon'N[AGR=[NUM='sg', GND='f']] -> 'fille'N[AGR=[NUM='pl', GND='m']] -> 'garcons'N[AGR=[NUM='pl', GND='f']] -> 'filles'D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'D[AGR=[NUM='pl', GND='m']] -> 'les'D[AGR=[NUM='pl', GND='f']] -> 'les'"""french_grammar = grammar.FeatureGrammar.fromstring(g)parser = parse.FeatureEarleyChartParser(french_grammar)valid_productions = set()for tokens in list(generate(french_grammar, n=100000)): parsed_tokens = parser.parse(tokens) try: first_parse = next(parsed_tokens) # Check if there's a valid parse. valid_productions.add(' '.join(first_parse.leaves())) except StopIteration: continuefor np in sorted(valid_productions): print(np)[out]:au garconaux fillesaux garconsde la fillede un garconde une filledes fillesdes garconsdu garconla fillele garconles fillesles garconsun garconune filleà la filleà un garconà une fille 这篇关于Python中具有特征结构的上下文无关文法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！

fromString

Python中具有特征结构的上下文无关文法