java - 在斯坦福解析器中的树中提取引理

我将斯坦福解析器用于实现。
我想使用句子的树来提取各种信息。

我在使用代码：
Get certain nodes out of a Parse Tree：

我有我的CoreMap句子和相应的树：

Tree sentenceTree=  sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
for (Tree sentenceTree: t) {
String pos = sentenceTree.label().value();
String wd = sentenceTree.firstChild().label().value();
Integer wdIndex = ??
CoreLabel token = sentence.get(CoreAnnotations.TokensAnnotation.class).get(wdIndex);

}

我无法提取引理，有人知道怎么做吗？

我尝试了以下代码，它可以正常工作，但会生成一些警告，而且也不是很干净：

Annotation a = new Annotation("geese");
ss.pipeline.annotate(a);
CoreMap se = a.get(CoreAnnotations.SentencesAnnotation.class).get(0);
CoreLabel token = se.get(CoreAnnotations.TokensAnnotation.class).get(0);
String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
System.out.println(lemma); // goose

有没有人建议？

谢谢！

最佳答案

我有同样的问题，但是我用Pairs leaf和leaf index的HashMap解决了。此代码将打印每个匹配的叶子的名词化的名词化版本。

        List<CoreLabel> tokens = sentence.get(TokensAnnotation.class);
        Tree tree = sentence.get(TreeAnnotation.class);
        TregexPattern pattern = TregexPattern.compile("NNP | NNS | NN | NNPS");
        TregexMatcher matcher = pattern.matcher(tree);

        HashMap<Tree, Integer> leafDict = new HashMap<>();
        int i = 0;
        for(Tree leaf : tree.getLeaves()) {
            leafDict.put(leaf, i);
            i++;
        }

        while (matcher.find()) {
            int index = leafDict.get( matcher.getMatch().firstChild());
            String result = tokens.get(index).get(LemmaAnnotation.class);
            System.out.println(result);
        }

仅当搜索的节点位于叶之前一级时，此解决方案才有效。