问题描述
NLTK
文档在此集成中相当差.我遵循的步骤是:
The NLTK
documentation is rather poor in this integration. The steps I followed were:
-
下载 http://nlp. stanford.edu/software/stanford-postagger-full-2015-04-20.zip 到
/home/me/stanford
下载 http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar 到/home/me/stanford
然后在ipython
控制台中:
在[11]中:导入nltk
In [11]: import nltk
In [12]: nltk.__version__
Out[12]: '3.1'
In [13]: from nltk.tag import StanfordNERTagger
然后
st = StanfordNERTagger('/home/me/stanford/stanford-postagger-full-2015-04-20.zip', '/home/me/stanford/stanford-spanish-corenlp-2015-01-08-models.jar')
但是当我尝试运行它时:
But when I tried to run it:
st.tag('Adolfo se la pasa corriendo'.split())
Error: no se ha encontrado o cargado la clase principal edu.stanford.nlp.ie.crf.CRFClassifier
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-14-0c1a96b480a6> in <module>()
----> 1 st.tag('Adolfo se la pasa corriendo'.split())
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag(self, tokens)
64 def tag(self, tokens):
65 # This function should return list of tuple rather than list of list
---> 66 return sum(self.tag_sents([tokens]), [])
67
68 def tag_sents(self, sentences):
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag_sents(self, sentences)
87 # Run the tagger and get the output
88 stanpos_output, _stderr = java(cmd, classpath=self._stanford_jar,
---> 89 stdout=PIPE, stderr=PIPE)
90 stanpos_output = stanpos_output.decode(encoding)
91
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/__init__.py in java(cmd, classpath, stdin, stdout, stderr, blocking)
132 if p.returncode != 0:
133 print(_decode_stdoutdata(stderr))
--> 134 raise OSError('Java command failed : ' + str(cmd))
135
136 return (stdout, stderr)
OSError: Java command failed : ['/usr/bin/java', '-mx1000m', '-cp', '/home/nanounanue/Descargas/stanford-spanish-corenlp-2015-01-08-models.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/nanounanue/Descargas/stanford-postagger-full-2015-04-20.zip', '-textFile', '/tmp/tmp6y169div', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']
与StandfordPOSTagger
注意:我需要这是西班牙语版本.注意:我正在python 3.4.3
NOTE: I need that this will be the spanish version.NOTE: I am running this in python 3.4.3
推荐答案
尝试:
# StanfordPOSTagger
from nltk.tag.stanford import StanfordPOSTagger
stanford_dir = '/home/me/stanford/stanford-postagger-full-2015-04-20/'
modelfile = stanford_dir + 'models/english-bidirectional-distsim.tagger'
jarfile = stanford_dir + 'stanford-postagger.jar'
st = StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)
# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'
st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)
有关使用斯坦福工具的NLTK API的详细信息,请查看: https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software#stanford-tagger-ner-tokenizer-and-parser
For detailed information on NLTK API with Stanford tools, take a look at: https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software#stanford-tagger-ner-tokenizer-and-parser
注意: NLTK API适用于单独的Stanford工具,如果您使用的是Stanford Core NLP,则最好遵循 http://www.eecs.qmul.ac.uk/~dm303/stanford- Dependency-parser-nltk-and-anaconda.html
Note: The NLTK APIs are for the individual Stanford tools, if you're using Stanford Core NLP, it's best to follow @dimazest instructions on http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html
对于西班牙语NER标签,我强烈建议您使用Stanford Core NLP( http://nlp.stanford.edu/software/corenlp.shtml ),而不是使用Stanford NER软件包( http://nlp.stanford.edu/software/CRF-NER.shtml ).并遵循@dimazest解决方案来读取JSON文件.
As for Spanish NER Tagging, I strongly suggest that you us Stanford Core NLP (http://nlp.stanford.edu/software/corenlp.shtml) instead of using the Stanford NER package (http://nlp.stanford.edu/software/CRF-NER.shtml). And follow @dimazest solution for JSON file reading.
或者,如果必须使用NER Packge,则可以尝试按照 https://github.com上的说明进行操作/alvations/nltk_cli (免责声明:此存储库与NLTK没有正式关联).在UNIX命令行上执行以下操作:
Alternatively, if you must use the NER packge, you can try following the instructions from https://github.com/alvations/nltk_cli (Disclaimer: This repo is not affiliated with NLTK officially). Do the following on the unix command line:
cd $HOME
wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar
unzip stanford-spanish-corenlp-2015-01-08-models.jar -d stanford-spanish
cp stanford-spanish/edu/stanford/nlp/models/ner/* /home/me/stanford/stanford-ner-2015-04-20/ner/classifiers/
然后在python中
# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/spanish.ancora.distsim.s512.crf.ser.gz'
st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)
这篇关于使用Stanford NLP(均为StanfordNERTagger和StanfordPOSTagger)为西班牙语设置NLTK的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!