python - nltk NER词提取

我已经检查了以前的相关主题，但是没有解决我的问题。我已经编写了从文本获得NER的代码。

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)

这使得结果不足

(S
  (NE Stallone/NNP)
  jason/NN
  's/POS
  film/NN
  (NE Rocky/NNP)
  was/VBD
  inducted/VBN
  into/IN
  the/DT
  (NE National/NNP Film/NNP Registry/NNP)
  as/IN
  well/RB
  as/IN
  having/VBG
  its/PRP$
  film/NN
  props/NNS
  placed/VBN
  in/IN
  the/DT
  (NE Smithsonian/NNP Museum/NNP)
  ./.)

虽然我希望只有NE这样，

Stallone
Rockey
National Film Registry
Smithsonian Museum

如何做到这一点？

更新

result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result

给出syntext错误，正确的写法是什么？

更新2

文字=“史泰龙·杰森的电影《洛奇》入选了美国国家电影登记局，其电影道具被放置在史密森尼博物馆中。”

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np

错误：

 np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
  File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
    raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.

所以我尝试了

np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]

给出空结果

最佳答案

返回的namedEnt实际上是Tree对象，它是list的子类。您可以执行以下操作来解析它：

[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]

输出：

['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']

binary标志设置为True将仅指示子树是否为NE，这是我们上面需要的。设置为False时，它将提供更多信息，例如NE是否是Organization，Person等。由于某种原因，带有On和Off标志的结果似乎彼此不一致。

关于python - nltk NER词提取，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/26862970/