[Corpora-List] CMU ARK TurboParser 2.0 released!

Andre Martins afm at cs.cmu.edu
Tue Sep 25 16:20:46 UTC 2012


We're pleased to announce a new release of TurboParser, version 2.0!

TurboParser is a free C++ implementation of a multilingual non-projective
dependency parser based on linear programming relaxations.

This package allows:
* learning a parser or a part-of-speech tagger from a treebank,
* running a parser or a part-of-speech tagger on new data,
* evaluating the results against a gold-standard.

The new version introduces a number of features:
* no external dependencies on CPLEX or any other non-free LP solvers; 
instead,
   the decoder is now based on AD3, our free library for MAP inference.
* the parser now outputs dependency labels along with the backbone 
structure.
* we also provide a trainable part-of-speech tagger (called 
TurboTagger), which
   has state-of-the-art accuracy for English (97.3% on section 23 of the 
PTB) and
   is fast (~40,000 tokens per second).
* the parser is much faster (~50x) than in previous versions.

Parsing runtimes/accuracies in the PTB (with POS tags predicted by 
TurboTagger):

================================================================================ 

            Model           Accuracy (UAS)      Runtime (tokens per 
sec.) (*)
================================================================================ 

   arc-factored (basic)         90.72                    ~4,300
   second order (standard)      92.57                    ~1,200
   + arbitrary siblings
     and head bigrams (full)    92.85                      ~900
================================================================================ 


(*) On a desktop machine with a Intel Core i7 CPU 3.4 GHz and 8GB RAM.

For more information and to download the parser, go to:
http://www.ark.cs.cmu.edu/TurboParser

To download pre-trained models for English, go to:
http://www.ark.cs.cmu.edu/TurboParser/sample_models

To contribute to TurboParser:
http://github.com/andre-martins/TurboParser.

--
Andre Martins
Priberam Labs, Lisbon, Portugal &
Instituto de Telecomunicacoes, Instituto Superior Tecnico, Lisbon, Portugal

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list