问题描述
从版本2.0 (02.03.2012).我目前正在运行命令行工具,无法弄清楚如何通过对程序进行线程化来利用多核.
Stanford Parser is now 'thread-safe' as of version 2.0 (02.03.2012). I am currently running the command line tools and cannot figure out how to make use of my multiple cores by threading the program.
在过去,此问题已用斯坦福解析器不是线程安全的"来回答,正如FAQ仍然指出的那样.我希望找到成功完成最新版本的线程的人.
In the past, this question has been answered with "Stanford Parser is not thread-safe", as the FAQ still says. I am hoping to find someone who has had success threading the latest version.
我尝试使用-t标志(-t10和-tLLP),因为这是我在搜索中可以找到的全部,但是都抛出错误.
I have tried using -t flag (-t10 and -tLLP) since that was all I could find in my searches, but both throw errors.
我发出的命令的示例是:
An example of a command I issue is:
java -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser \
-outputFormat "oneline" ./grammar/englishPCFG.ser.gz ./corpus > corpus.lex
推荐答案
从版本2.0.5开始,您现在可以使用选项-nthreads k
轻松使用多个线程.例如,您的命令可以像这样:
Starting with version 2.0.5, you can now easily use multiple threads with the option -nthreads k
. For example, your command can be like this:
java -mx6g edu.stanford.nlp.parser.lexparser.LexicalizedParser -nthreads 4 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz file.txt > file.stp
(2013年之前的版本2的发布无法从命令行启用多线程,只能在使用API时启用.)
(Releases of version 2 prior to 2013 had no way to enable multithreading from the command-line, but only when using the API.)
在内部,您可以根据需要在一个JVM进程中同时运行多个解析线程.您可以通过获取和使用多个LexicalizedParserQuery对象(通过parserQuery()
方法)来执行此操作,也可以隐式地通过从一个LexicalizedParser调用apply(...)
或parseTree(...)
来隐式执行此操作. -nthreads k
选项通过使用Executor
框架将连续的句子发送到不同的解析器来为您完成此任务.您还可以同时创建多个LexicalizedParser,例如,用于解析不同的语言.
Internally, you can simultaneously run as many parsing threads inside one JVM process as you want. You can do this either by getting and using multiple LexicalizedParserQuery objects (via the parserQuery()
method) or implicitly by calling apply(...)
or parseTree(...)
off one LexicalizedParser. The -nthreads k
option does this for you by sending successive sentences to different parsers using the Executor
framework. You can also simultaneously create multiple LexicalizedParser's, e.g., for parsing different languages.
多个LexicalizedparserQuery对象共享相同的语法(LexicalizedParser),但是由于大多数内存都用于图表解析中使用的临时结构,因此节省的内存空间并不是很大.因此,如果同时运行大量解析线程,则需要为JVM提供大量内存,如上例所示.
Multiple LexicalizedparserQuery objects share the same grammar (LexicalizedParser), but the memory space savings aren't huge, as most of the memory goes to the transient structures used in chart parsing. So, if you are running lots of parsing threads concurrently, you will need to give a lot of memory to the JVM, as in the example above.
p.s.抱歉,是的,有些文档仍然需要更新.但是-tLPP是用于指定特定于语言的资源的一个标志.斯坦福解析器没有-t标志.
p.s. Sorry, yes, some of the documentation still needs updating. But -tLPP is one flag for specifying language-specific resources. The Stanford Parser has no -t flag.
这篇关于Stanford Parser多线程用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!