[Corpora-List] Cambridge SMT Toolkit - Open Source Release

Bill Byrne bill.byrne at eng.cam.ac.uk
Fri Jun 6 17:30:02 UTC 2014


Statistical machine translation tools developed at Cambridge University are now available at http://ucam-smt.github.io/ .  

This is an initial release, featuring:
- HiFST -- Hierarchical phrase-based statistical machine translation based on the Google OpenFst Toolkit http://openfst.org 
- Direct production of translation lattices as Weighted Finite State Automata
- Efficient WFSA rescoring procedures
- OpenFst wrappers for direct inclusion of KenLM and ARPA language models as WFSAs
- Lattice Minimum Error Rate Training
- Lattice Minimum Bayes Risk decoding
- Recursive Transition Networks and Pushdown Automata
- Client/Server mode
- WFSA true-casing
- and much more

A tutorial (http://ucam-smt.github.io/tutorial) based on the Cambridge 2013 WMT Russian-English system is also included 

To get the toolkit:
- https://github.com/ucam-smt/ucam-smt/archive/master.zip
- git clone https://github.com/ucam-smt/ucam-smt.git

--
Bill Byrne
University of Cambridge
http://mi.eng.cam.ac.uk/~wjb31    


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list