本文介绍了创建.conll文件作为斯坦福分析器的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用斯坦福分析器创建一个.conll文件作进一步处理。
到目前为止,我设法用命令解析测试语句:

  stanford-parser-full-2013-06 -20 / lexparser.sh stanford-parser-full-2013-06-20 / data / testsent.txt> output.txt 

我想在.conll中创建一个文件,而不是一个txt文件。我很确定这是可能的,在文档中提到(参见)。我可以以某种方式修改我的命令,或者我将不得不编写Java代码?



感谢您的帮助!如果您正在查找以CoNLL X(CoNLL 2006)格式打印的依赖关系,请尝试从命令行执行以下操作:

  java -mx150m -cpstanford-parser-full-2013-06-20 / *:edu.stanford.nlp .parser.lexparser.LexicalizedParser -outputFormatpennedu / stanford / nlp / models / lexparser / englishPCFG.ser.gz stanford-parser-full-2013-06-20 / data / test.txt> testsent.tree 

java -mx150m -cpstanford-parser-full-2013-06-20 / *:edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx


$ b

以下是第一个测试语句的输出:

  1分数_ NNS NNS _ 4 nsubj _ _ $ b $ _ 2 _ IN IN _ 0已删除_ _ 
3个物业_ NNS NNS _ 1预备_ _ _
4 _ V BP VBP _ 0 root _ _
5 _ IN IN _ 0擦除_ _
6极端_ JJ JJ _ 8 amod _ _
7 fire _ NN NN _ 8 nn _ _
8威胁_ NN NN _ 4准备_ _ _ $ $ b 9 _作为_ IN IN _ 13标记_ _
10 a _ DT DT _ 12 det _ _
11巨大_ JJ JJ _ 12 amod _ _
12 blaze _ NN NN _ 15 xsubj _ _
13 _ VBZ VBZ _ 4 advcl _ _
14 to _ TO _ 15 aux _ _
15提前_ VB VB _ 13 xcomp _ _
16至_ IN IN _ 0已删除_ _
17 Sydney _ NNP NNP _ 20 poss _ _
18 _ POS POS _ 0已删除_ _
19西北_ JJ JJ _ 20 amod _ _
20郊区_ NNS NNS _ 15准备_ _
21。 _。 。 _ 4 punct _ _


I want to use Stanford Parser to create a .conll file for further processing. So far I managed to parse the test sentence with the command:

stanford-parser-full-2013-06-20/lexparser.sh  stanford-parser-full-2013-06-20/data/testsent.txt > output.txt

Instead of a txt file I would like to have a file in .conll. I'm pretty sure it is possible, at it is mentioned in the documentation (see here). Can I somehow modify my command or will I have to write Javacode?

Thanks for help!

解决方案

If you're looking for dependencies printed out in CoNLL X (CoNLL 2006) format, try this from the command line:

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz stanford-parser-full-2013-06-20/data/testsent.txt >testsent.tree

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx

Here's the output for the first test sentence:

1       Scores        _       NNS     NNS     _       4       nsubj        _       _
2       of            _       IN      IN      _       0       erased       _       _
3       properties    _       NNS     NNS     _       1       prep_of      _       _
4       are           _       VBP     VBP     _       0       root         _       _
5       under         _       IN      IN      _       0       erased       _       _
6       extreme       _       JJ      JJ      _       8       amod         _       _
7       fire          _       NN      NN      _       8       nn           _       _
8       threat        _       NN      NN      _       4       prep_under   _       _
9       as            _       IN      IN      _      13       mark         _       _
10      a             _       DT      DT      _      12       det          _       _
11      huge          _       JJ      JJ      _      12       amod         _       _
12      blaze         _       NN      NN      _      15       xsubj        _       _
13      continues     _       VBZ     VBZ     _       4       advcl        _       _
14      to            _       TO      TO      _      15       aux          _       _
15      advance       _       VB      VB      _      13       xcomp        _       _
16      through       _       IN      IN      _       0       erased       _       _
17      Sydney        _       NNP     NNP     _      20       poss         _       _
18      's            _       POS     POS     _       0       erased       _       _
19      north-western _       JJ      JJ      _      20       amod         _       _
20      suburbs       _       NNS     NNS     _      15       prep_through _       _
21      .             _       .       .       _       4       punct        _       _

这篇关于创建.conll文件作为斯坦福分析器的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-09 23:27