本文介绍了使用StanfordCoreNLP管道的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我使用TokenizerAnnotator,WordsToSentencesAnnotator,POSTaggerAnnotator和sutime创建AnnotationPipeline,则会将TimexAnnotations附加到结果注释中.

但是,如果我创建一个StanfordCoreNLP管道,并且其"annotators"属性设置为"tokenize,ssplit,pos,lemma,ner",即使相关的单个令牌都被NER标记为DATE,我也不会得到TimexAnnotations./p>

为什么会有这种区别?

解决方案

运行注释时,我们从文档中提取所有实体提及,并且我们将DATE视为实体提及.这是一些示例代码.如果您只想提取时间表达式并且希望填充TimexAnnotations.class字段,那么我会添加一些注释掉的选项.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.time.TimeAnnotations;

import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class SUTimeExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("The date is 1 April 2017");
    Properties props = new Properties();
    //props.setProperty("customAnnotatorClass.time", "edu.stanford.nlp.time.TimeAnnotator");
    //props.setProperty("annotators", "tokenize,ssplit,pos,lemma,time");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      if (entityMention.get(CoreAnnotations.EntityTypeAnnotation.class).equals("DATE"))
        System.out.println(entityMention);
    }
  }
}

If I create an AnnotationPipeline with a TokenizerAnnotator, WordsToSentencesAnnotator, POSTaggerAnnotator, and sutime, I get TimexAnnotations attached to the resulting annotation.

But if I create a StanfordCoreNLP pipeline with the "annotators" property set to "tokenize, ssplit, pos, lemma, ner", I don't get TimexAnnotations even though the relevant individual tokens are NER-tagged as DATE.

Why is there this difference?

解决方案

When we run annotations, we extract all entity mentions from the document and we consider a DATE to be an entity mention. Here is some sample code. I've add some commented out options if you just want to extract time expressions and you want that TimexAnnotations.class field to be populated.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.time.TimeAnnotations;

import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class SUTimeExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("The date is 1 April 2017");
    Properties props = new Properties();
    //props.setProperty("customAnnotatorClass.time", "edu.stanford.nlp.time.TimeAnnotator");
    //props.setProperty("annotators", "tokenize,ssplit,pos,lemma,time");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      if (entityMention.get(CoreAnnotations.EntityTypeAnnotation.class).equals("DATE"))
        System.out.println(entityMention);
    }
  }
}

这篇关于使用StanfordCoreNLP管道的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-09 23:27