java - 如何将Stanford nlp中的CoreDocument保存到磁盘

创建带注释的CoreDocument后，要将其保存到磁盘中，然后再检索它。

计算带注释的CoreDocument很慢。创建之后，曾经想在以后使用它，即从磁盘检索它。

props.setProperty("annotators",
"tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
    // set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
    props.setProperty("coref.algorithm", "neural");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = new CoreDocument(content);
    // annnotate the document
    pipeline.annotate(document);

最佳答案

您应该查看AnnotationSerializer类：

https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/pipeline/AnnotationSerializer.html

具体来说，尽管此类有多个实例化，但我们主要使用了ProtobufAnnotationSerializer。

您可以在某些集成测试中看到使用示例。 ProtobufSerializationSanityITest是如何使用它的简单示例。 ProtobufAnnotationSerializerSlowITest是一个更详尽但更复杂的示例。您可以在Github repository中找到它们。