[Corpora-List] Docent - a Document-Level Local Search Decoder for Phrase-Based Statistical Machine Translation
Joerg Tiedemann
jorg.tiedemann at lingfil.uu.se
Wed Sep 5 16:20:27 UTC 2012
The following announcement may be interesting for the SMT-people on this list.
My apologies for cross-postings.
Jörg
The Computational Linguistics group at Uppsala University is happy to announce
the first public release of
******************************************************************************
Docent - a Document-Level Local Search Decoder for
Phrase-Based Statistical Machine Translation
******************************************************************************
Docent is a decoder for phrase-based SMT. It is currently the only publicly
available SMT decoder that allows the inclusion of discourse-wide features.
* Documents as Translation Units
Other decoders translate sentence by sentence. Docent treats documents as
units and lets you create SMT models that exploit discourse-wide,
cross-sentence information.
* Local Search Decoding
Unlike other decoders, Docent uses a decoding approach based on local
search to escape the constraints of the more traditional approach and handle
long-range dependencies across sentence boundaries.
* Full integration with dynamic programming beam search
Docent supports initialisation with traditional DP beam search by linking
against Moses and running local search as a second pass to combine the
effectiveness of DP search with the versatility of local search.
Additional features to support model development include
- direct support for NIST-XML and MMAX file formats
- integrated Snowball stemmer
- support for LSA-based semantic spaces
Docent is distributed under the GNU General Public License (GPL).
Read more about it at
https://github.com/chardmeier/docent/wiki
and get the code from
https://github.com/chardmeier/docent
Note: This software is aimed at researchers who want to develop discourse-level
SMT models. If you're looking for a mature package to use in a production
environment, it's not for you. If you want to invent tomorrow's most exciting
SMT models, it is.
Inquiries can be addressed to docent at stp.lingfil.uu.se
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list