http://baojie.org/blog/2014/06/16/nlp-parser/

特性总表

FeaturesSatisfied byNote
Web-scale parsing: for both training and parsing time, should be able to handle TB or higher text volume efficientlyLink, MiniPar, Malt, DeSR, MST, pfp, MBSPLinear-time parsing is generally possible with dependency parsing; also parallelism support is important
Potentially support both statistical and knowledge-based parsingLink, NLTK, Malt, DepParse, MBSP
High accuracyStanford, Collins and Bikel, Berkeley, Charniak-Johnson, RASP, Malt, Link, DeSR, MST, pfp, Senna
Active developmentStanford, Berkeley, Link, NLTK, Malt, DeSR, pfp, MBSP, OpenNLP, Senna
Production-friendly licenseLink, NLTK, RASP, Malt, DepParse, OpenNLPSome others with GPL can be used in production as a web service without opening source other parts
Good documentationStanford, Link, NLTK, Malt, DeSR, MBSP, OpenNLP
Code Reusability: easy-to-use API or easy-to-understand codeStanford, Link, NLTK, MiniPar, DeSR, DepParse, pfp, MBSP, Senna

 

详细比较

这张表比较宽,点击开头的print或pdf按钮可见全表

ParserInternationalizationFeature SummaryLinksActive Project
Stanford Parser


  • Constituency and dependency
  • Java, with Python and Ruby interfaces
  • GPL license
  • By Chris Manning et al
English, Chinese, German, Arabic, Italian, Bulgarian, and Portuguese
  • Part of Stanford Core NLP Toolkit
  • It is a package of three kinds of parsers: a PCFG (probabilistic context-free grammar) parser, a lexicalized dependency parser, and a lexicalized PCFG parser
  • Parsing accuracy ranks consistently high in surveys
  • Good documentation
  • The PCFG parser is based CKY algorithm
  • However, the dependency parser is anexhaustive dependency parser with O(n^4) complexity. It is much worse than other linear time O(n) dependency parsers
Yes (frequent releases)
Collins and Bikel Parser


  • Constituency parser
  • Java
  • Free for research
  • By Dan Bikel (UPenn) andMike Collins(Columbia)
English, Chinese, Arabic
  • It is an improvement of Collins parser
  • Based on CYK algorithm (source code)
  • Lexicalized PCFG
  • state-of-the-art performance for English
No (since 2008)
Berkeley parser


  • Constituency parser
  • Java
  • GPL
  • Slav Petrov and Dan Klein
English, Bulgarian, Arabic, Chinese, French, German
  • based on a hierarchical coarse-to-fine parsing, where a sequence ofgrammars is considered
  • no need for language-specific adaptations, Automatically induced PCFG
  • state-of-the-art performance for English on the Penn Treebank
Yes(infrequent changes)
Charniak-Johnson Parser


  • Constituency parser
  • C
  • Eugene Charniak (Brown Univ) and Mark Johnson
English
  • Based on discriminative reranking, dynamic programming
  • Lexicalized N-Best PCFG : for each sentence, constructing sets of 50-best parses based on a heuristic coarse-to-fine generative parser
  • estimate the reranker feature weights using MaxEnt, Averaged Perceptron, etc
  • State of the art performance on English
Yes (infrequent changes)
Link Grammar Parser


  • Dependency parser
  • C, Bindings from Ruby, Python, perl, Java and Ocaml
  • BSD license
  • Davy Temperley, John Lafferty and Daniel Sleator (CMU)
  • Dom Lachowicz, Linas Vepstas (AbiWord)
Persian, Arabic, Chinese, German, Russian
  • Based on lexicons of link grammar (similar to IBM Watson’s English slot grammar parser). ItsEnglish dictionary has 70k+ words
  • Produce both dependencies (labelled links connecting pairs of words) and constituents (Penn tree-bank style phrase tree)
  • Performance is comparable to the Stanford PCFG parsing model, and is 3+ times faster than the Stanford lexicalized model.
  • 10+ extensions, including FrameNet-style framing, reference (anaphora) resolution and natural language generation
  • However, it is grammar-rigid, may fail when the sentence is grammatically incomplete or incompliant
  • Very good documentation
Yes (frequent releases)
NLTK Parser


  • Constituency and dependency
  • Python
  • Apache License
  • Steven Bird
English, German, Chinese, Japanese
  • Very good documentation, various books available. Widely adopted in education and web application development
  • Very easy to use, clean API interface
  • Part of whole set of NLP tools covering major NLP needs
  • Constituency parser with PCFG
  • Dependency parser using shift-reduce algorithm, based CFG
  • However, its parser implementation is less optimized
Yes (very active)
MiniPar


  • Dependency parser
  • C and Lisp, with Java binding in GATE
  • free of charge for non-commercial use
  • Dekang Lin
English
  • One of the early dependency parser
  • After 15+ years, is slightly worse than state-of-the-art parsers
  • Code is small and easy to extend
  • Its dependency maybe useful in designing a new parser
No (since 1994)
RASP


  • C and Common Lisp
  • Constituency and dependency
  • LGPL
  • John Carroll et al (Sussex and Cambridge)
English
  • RASP = Robust Accurate Statistical Parsing
  • fully domain-independent automated training
  • integration of statistical techniques and incremental grammar rule induction
  • state-of-the-art performance
Yes (infrequent releases)
MaltParser


English, French, Swedish
  • Shift-reduce algorithm (automaton-based)
  • Inductive dependency parsing that learns from a treebank
  • Very fast: linear time parsing
  • State-of-the-art performance on accuracy
Yes (frequent releases)
DeSR


  • Dependency parser
  • C++ wth Python binding
  • GPL
  • Giuseppe Attardi
Italian, English, French, and 10+ others
  • Part of the Tanl project
  • shift-reduce dependency parser, can handle non-projective dependencies
  • deterministically parsing, very fast (linear time)
  • fully labeled dependency trees
  • training with Multi Layer Perceptron, Averaged Perceptron, Maximum Entropy, SVM, memory-based learning using TiMBL
  • Among the best on English labeled dependency parsing
Yes (frequent releases)
MSTParser


  • Dependency parser
  • Java
  • Jason Baldrige and Ryan McDonald (UPenn)
English, Chinese and 10+ other languages
  • MST = Maximum-Spanning Tree, based on graph algorithm
  • Support online learning
  • State-of-the-art performance, comparable to MaltParser
  • outperform MaltParser on longer dependencies, but typically slower
No (since 2007)
DepParse


  • Dependency parser
  • Python
  • MIT Lincense
  • Leif Johnson (UT Austin)
English
  • maximum spanning tree (MST) parser and a stack-based, shift-reduce parser
  • support data parallelism on multicore machines
  • performance has not been evaluated
  • Self-contained, easy to extend
No (since 2010)
pfp


  • Constituency parser
  • C++ and Python
  • GPL
  • Erik Frey, Norman Casagrande et al (Wavii Inc)
English
  • pfp — pretty fast statistical parser
  • Using PCFG grammar and CYK algorithm
  • 3-4x faster than the Stanford parser, and uses 5-8x less resident memory
  • Thread-safe/multi-core support
Yes [1]
MBSP


  • Shallow (dependency) parsing
  • Python
  • GPL and Commercial
English
  • Memory-Based Shallow Parser, based on the TiMBL and MBT memory-based learning applications
  • No need for manual pattern or grammar definition
  • Client-server architecture
  • Do shallow parsing,
  • Share an API with Pattern
  • Can be used together with DeSR and NLTK
Yes
OpenNLP Parser


  • Constituency parser
  • Java
  • Apache License (An Apache project)
English
  • A chunking parser (relatively simple)
  • Can be used with UIMA
Yes
Senna


  • Constituency parser
  • C
  • a non-commercial license
English
  • Using deep-learning
  • Very small code (3500 lines)
  • syntactic parsing
  • State-of-the-art performance
  • Pro
09-06 11:28