常见自然语言语法分析器总结 | jiangwen127

jiangwen127

关注(28)粉丝(399)

常见自然语言语法分析器总结

http://baojie.org/blog/2014/06/16/nlp-parser/

特性总表

Features	Satisfied by	Note
Web-scale parsing: for both training and parsing time, should be able to handle TB or higher text volume efficiently	Link, MiniPar, Malt, DeSR, MST, pfp, MBSP	Linear-time parsing is generally possible with dependency parsing; also parallelism support is important
Potentially support both statistical and knowledge-based parsing	Link, NLTK, Malt, DepParse, MBSP
High accuracy	Stanford, Collins and Bikel, Berkeley, Charniak-Johnson, RASP, Malt, Link, DeSR, MST, pfp, Senna
Active development	Stanford, Berkeley, Link, NLTK, Malt, DeSR, pfp, MBSP, OpenNLP, Senna
Production-friendly license	Link, NLTK, RASP, Malt, DepParse, OpenNLP	Some others with GPL can be used in production as a web service without opening source other parts
Good documentation	Stanford, Link, NLTK, Malt, DeSR, MBSP, OpenNLP
Code Reusability: easy-to-use API or easy-to-understand code	Stanford, Link, NLTK, MiniPar, DeSR, DepParse, pfp, MBSP, Senna

详细比较

这张表比较宽，点击开头的print或pdf按钮可见全表

Parser	Internationalization	Feature Summary	Links	Active Project
Stanford Parser Constituency and dependency Java, with Python and Ruby interfaces GPL license By Chris Manning et al	English, Chinese, German, Arabic, Italian, Bulgarian, and Portuguese	Part of Stanford Core NLP Toolkit It is a package of three kinds of parsers: a PCFG (probabilistic context-free grammar) parser, a lexicalized dependency parser, and a lexicalized PCFG parser Parsing accuracy ranks consistently high in surveys Good documentation The PCFG parser is based CKY algorithm However, the dependency parser is anexhaustive dependency parser with O(n^4) complexity. It is much worse than other linear time O(n) dependency parsers	Homepage http://nlp.stanford.edu/software/lex-parser.shtml Download http://nlp.stanford.edu/software/stanford-parser-2012-07-09.tgz Online test http://nlp.stanford.edu:8080/parser/ Javadoc http://nlp.stanford.edu/nlp/javadoc/javanlp/	Yes (frequent releases)
Collins and Bikel Parser Constituency parser Java Free for research By Dan Bikel (UPenn) andMike Collins(Columbia)	English, Chinese, Arabic	It is an improvement of Collins parser Based on CYK algorithm (source code) Lexicalized PCFG state-of-the-art performance for English	Homepage: http://www.cis.upenn.edu/~dbikel/software.html#stat-parser Download http://www.cis.upenn.edu/~dbikel/download.html Javadoc http://www.cis.upenn.edu/~dbikel/dbparser-doc/overview-summary.html	No (since 2008)
Berkeley parser Constituency parser Java GPL Slav Petrov and Dan Klein	English, Bulgarian, Arabic, Chinese, French, German	based on a hierarchical coarse-to-fine parsing, where a sequence ofgrammars is considered no need for language-specific adaptations, Automatically induced PCFG state-of-the-art performance for English on the Penn Treebank	Project homepage http://code.google.com/p/berkeleyparser/ Online test http://tomato.banatao.berkeley.edu:8080/parser/parser.html	Yes(infrequent changes)
Charniak-Johnson Parser Constituency parser C Eugene Charniak (Brown Univ) and Mark Johnson	English	Based on discriminative reranking, dynamic programming Lexicalized N-Best PCFG : for each sentence, constructing sets of 50-best parses based on a heuristic coarse-to-fine generative parser estimate the reranker feature weights using MaxEnt, Averaged Perceptron, etc State of the art performance on English	Current C-J parser (2011):http://web.science.mq.edu.au/~mjohnson/code/reranking-parser-2011-12-17.tgz Original (2005) Charniak parser ftp://ftp.cs.brown.edu/pub/nlparser/	Yes (infrequent changes)
Link Grammar Parser Dependency parser C, Bindings from Ruby, Python, perl, Java and Ocaml BSD license Davy Temperley, John Lafferty and Daniel Sleator (CMU) Dom Lachowicz, Linas Vepstas (AbiWord)	Persian, Arabic, Chinese, German, Russian	Based on lexicons of link grammar (similar to IBM Watson’s English slot grammar parser). ItsEnglish dictionary has 70k+ words Produce both dependencies (labelled links connecting pairs of words) and constituents (Penn tree-bank style phrase tree) Performance is comparable to the Stanford PCFG parsing model, and is 3+ times faster than the Stanford lexicalized model. 10+ extensions, including FrameNet-style framing, reference (anaphora) resolution and natural language generation However, it is grammar-rigid, may fail when the sentence is grammatically incomplete or incompliant Very good documentation	Original CMU page: http://www.link.cs.cmu.edu/link/ Project page: http://www.abisource.com/projects/link-grammar/ part ofOpen Cognition project Online test: http://www.link.cs.cmu.edu/link/submit-sentence-4.html SVN: http://svn.abisource.com/link-grammar/ API: http://www.abisource.com/projects/link-grammar/api/index.html Documentation: http://www.abisource.com/projects/link-grammar/dict/index.html	Yes (frequent releases)
NLTK Parser Constituency and dependency Python Apache License Steven Bird	English, German, Chinese, Japanese	Very good documentation, various books available. Widely adopted in education and web application development Very easy to use, clean API interface Part of whole set of NLP tools covering major NLP needs Constituency parser with PCFG Dependency parser using shift-reduce algorithm, based CFG However, its parser implementation is less optimized	Project homepage: http://nltk.org/ Source code: http://nltk.org/api/nltk.parse.html#module-nltk.parse Book: Natural Language Processing with Python Book: Python Text Processing with NLTK 2.0 Cookbook	Yes (very active)
MiniPar Dependency parser C and Lisp, with Java binding in GATE free of charge for non-commercial use Dekang Lin	English	One of the early dependency parser After 15+ years, is slightly worse than state-of-the-art parsers Code is small and easy to extend Its dependency maybe useful in designing a new parser	Homepage and downloadhttp://webdocs.cs.ualberta.ca/~lindek/minipar.htm/	No (since 1994)
RASP C and Common Lisp Constituency and dependency LGPL John Carroll et al (Sussex and Cambridge)	English	RASP = Robust Accurate Statistical Parsing fully domain-independent automated training integration of statistical techniques and incremental grammar rule induction state-of-the-art performance	Homepage:http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/project.html Download: http://ilexir.co.uk/applications/rasp/download/	Yes (infrequent releases)
MaltParser Dependency parser Java, with Python binding in NLTK Johan Hall, Jens Nilsson and Joakim Nivre Commercial friendly license	English, French, Swedish	Shift-reduce algorithm (automaton-based) Inductive dependency parsing that learns from a treebank Very fast: linear time parsing State-of-the-art performance on accuracy	Project home http://www.maltparser.org/ Javadoc http://www.maltparser.org/api/index.html	Yes (frequent releases)
DeSR Dependency parser C++ wth Python binding GPL Giuseppe Attardi	Italian, English, French, and 10+ others	Part of the Tanl project shift-reduce dependency parser, can handle non-projective dependencies deterministically parsing, very fast (linear time) fully labeled dependency trees training with Multi Layer Perceptron, Averaged Perceptron, Maximum Entropy, SVM, memory-based learning using TiMBL Among the best on English labeled dependency parsing	Project homepage https://sites.google.com/site/desrparser/ Code http://sourceforge.net/projects/desr/ SVN: http://desr.svn.sourceforge.net/viewvc/desr/trunk/ API: http://medialab.di.unipi.it/Project/QA/Parser/doc/ Online test: http://paleo.di.unipi.it/it/parse http://medialab.di.unipi.it/Project/QA/Parser/sim.html	Yes (frequent releases)
MSTParser Dependency parser Java Jason Baldrige and Ryan McDonald (UPenn)	English, Chinese and 10+ other languages	MST = Maximum-Spanning Tree, based on graph algorithm Support online learning State-of-the-art performance, comparable to MaltParser outperform MaltParser on longer dependencies, but typically slower	Project homepagehttp://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html SVN http://mstparser.svn.sourceforge.net/viewvc/mstparser/	No (since 2007)
DepParse Dependency parser Python MIT Lincense Leif Johnson (UT Austin)	English	maximum spanning tree (MST) parser and a stack-based, shift-reduce parser support data parallelism on multicore machines performance has not been evaluated Self-contained, easy to extend	Project homepage http://code.google.com/p/python-depparse/ Source http://code.google.com/p/python-depparse/source/browse/	No (since 2010)
pfp Constituency parser C++ and Python GPL Erik Frey, Norman Casagrande et al (Wavii Inc)	English	pfp — pretty fast statistical parser Using PCFG grammar and CYK algorithm 3-4x faster than the Stanford parser, and uses 5-8x less resident memory Thread-safe/multi-core support	Homepage https://github.com/wavii/pfp/blob/master/README.md	Yes [1]
MBSP Shallow (dependency) parsing Python GPL and Commercial	English	Memory-Based Shallow Parser, based on the TiMBL and MBT memory-based learning applications No need for manual pattern or grammar definition Client-server architecture Do shallow parsing, Share an API with Pattern Can be used together with DeSR and NLTK	Homepage http://www.clips.ua.ac.be/pages/MBSP	Yes
OpenNLP Parser Constituency parser Java Apache License (An Apache project)	English	A chunking parser (relatively simple) Can be used with UIMA	Project homepage http://opennlp.apache.org/ Source SVN http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/parser/	Yes
Senna Constituency parser C a non-commercial license	English	Using deep-learning Very small code (3500 lines) syntactic parsing State-of-the-art performance	Pro

09-06 11:28