[Corpora-List] Docent - a Document-Level Local Search Decoder for Phrase-Based Statistical Machine Translation

Joerg Tiedemann jorg.tiedemann at lingfil.uu.se
Wed Sep 5 16:20:27 UTC 2012


The following announcement may be interesting for the SMT-people on this list.
My apologies for cross-postings.

Jörg



The Computational Linguistics group at Uppsala University is happy to announce
the first public release of

******************************************************************************
Docent - a Document-Level Local Search Decoder for
        Phrase-Based Statistical Machine Translation
******************************************************************************

Docent is a decoder for phrase-based SMT. It is currently the only publicly
available SMT decoder that allows the inclusion of discourse-wide features.

* Documents as Translation Units
    Other decoders translate sentence by sentence. Docent treats documents as
    units and lets you create SMT models that exploit discourse-wide,
    cross-sentence information.

* Local Search Decoding
    Unlike other decoders, Docent uses a decoding approach based on local
    search to escape the constraints of the more traditional approach and handle
    long-range dependencies across sentence boundaries.

* Full integration with dynamic programming beam search
    Docent supports initialisation with traditional DP beam search by linking
    against Moses and running local search as a second pass to combine the
    effectiveness of DP search with the versatility of local search.

Additional features to support model development include
- direct support for NIST-XML and MMAX file formats
- integrated Snowball stemmer
- support for LSA-based semantic spaces


Docent is distributed under the GNU General Public License (GPL).

Read more about it at
       https://github.com/chardmeier/docent/wiki
and get the code from
       https://github.com/chardmeier/docent

Note: This software is aimed at researchers who want to develop discourse-level
SMT models. If you're looking for a mature package to use in a production
environment, it's not for you. If you want to invent tomorrow's most exciting
SMT models, it is.

Inquiries can be addressed to docent at stp.lingfil.uu.se

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list