java - Stanford coreNLP情感，无需分句

我有一些文件正在馈送给coreNLP的情感标记器。我已经将文件分解成单独的句子，因此想为每个文件返回一个标签。如何使java命令返回一个标签。

该命令看起来像这样的java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin，并输出如下：

Annotation pipeline timing information:
TokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.0 sec. for 8 tokens at 296.3 tokens/sec.
Pipeline setup: 0.0 sec.
Total time for StanfordCoreNLP pipeline: 8.7 sec.

C:\stanford-corenlp-full-2015-04-20>java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator sentiment
Reading in text from stdin.
Please enter one sentence per line.
Processing will end when EOF is reached.

Computer is fun. Not too fun.
  Positive
  Neutral

如何通过删除标点符号来使输出与下面的操作类似，成为一个标签：

Computer is fun Not too fun.
  Positive

似乎我应该能够轻松完成此操作，因为有了-ssplit.isOneSentence，据我所知，情感标记器使用了ssplit，但我不知道如何重新编写命令以合并它（我已阅读command line documentation）。

最佳答案

看来SentimentPipeline中有一个错误，因为使用-stdin选项时不应在一行中拆分句子。我现在已解决此问题，但除非您编译自己的版本，否则在我们发布CoreNLP的下一个版本之前，这对您没有帮助。

但是，还有一种替代方法（可能是更好的方法）使用CoreNLP管道获取句子的情感标签。

以下命令运行与您的命令相同的代码，但同时允许您为各个注释器指定更多选项（包括-ssplit.eolonly选项）。

java -cp "*" -mx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,parse,sentiment" -ssplit.eolonly

关于java - Stanford coreNLP情感，无需分句，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/34483978/