问题描述
在NLTK中,如何遍历解析的句子以返回名词短语字符串列表?
In NLTK, how do I traverse a parsed sentence to return a list of noun phrase strings?
我有两个目标:
(1)创建名词短语列表,而不是使用"traverse()"方法打印它们.我目前使用StringIO记录现有traverse()方法的输出.这不是可接受的解决方案.
(2)取消解析名词短语字符串,这样:'(NP Michael/NNP Jackson/NNP)'变成'Michael Jackson'. NLTK中是否有一种可以解析的方法?
I have two goals:
(1) Create the list of Noun Phrases instead of printing them using the 'traverse()' method. I presently use StringIO to record the output of the existing traverse() method. That is not an acceptable solution.
(2) De-parse the Noun Phrase string so: '(NP Michael/NNP Jackson/NNP)' becomes 'Michael Jackson'. Is there a method in NLTK to de-parse?
NLTK文档建议使用traverse()查看名词短语,但是如何在此递归方法中捕获"t",以便生成字符串名词短语列表?
The NLTK documentation recommends using traverse() to view the Noun Phrase, but how do I capture the 't' in this recursive method so I generate a list of string Noun Phrases?
from nltk.tag import pos_tag
def traverse(t):
try:
t.label()
except AttributeError:
return
else:
if t.label() == 'NP': print(t) # or do something else
else:
for child in t:
traverse(child)
def nounPhrase(tagged_sent):
# Tag sentence for part of speech
tagged_sent = pos_tag(sentence.split()) # List of tuples with [(Word, PartOfSpeech)]
# Define several tag patterns
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and noun
{<NNP>+} # chunk sequences of proper nouns
{<NN>+} # chunk consecutive nouns
"""
cp = nltk.RegexpParser(grammar) # Define Parser
SentenceTree = cp.parse(tagged_sent)
NounPhrases = traverse(SentenceTree) # collect Noun Phrase
return(NounPhrases)
sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
NP = nounPhrase(tagged_sent)
print(NP)
当前打印:
(NP Michael/NNP Jackson/NNP)
(NP McDonalds/NNP)
并将无"存储到NP
This presently prints:
(NP Michael/NNP Jackson/NNP)
(NP McDonalds/NNP)
and stores 'None' to NP
推荐答案
def extract_np(psent):
for subtree in psent.subtrees():
if subtree.label() == 'NP':
yield ' '.join(word for word, tag in subtree.leaves())
cp = nltk.RegexpParser(grammar)
parsed_sent = cp.parse(tagged_sent)
for npstr in extract_np(parsed_sent):
print (npstr)
这篇关于NLTK:如何遍历名词短语以返回字符串列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!