[Corpora-List] Tool for raw parallel corpora alignment

Chris Dyer redpony at umd.edu
Mon Mar 16 15:23:36 UTC 2009


You might have a look at MTTK:
  http://mi.eng.cam.ac.uk/~wjb31/distrib/mttkv1/
It will do both sentence and word alignment for you, all within one
tool.  It's performance should be fairly close to the state-of-the-art
for most language pairs.

Chris


On Mon, Mar 16, 2009 at 2:26 PM, Emmanuel Prochasson
<emmanuel.prochasson at univ-nantes.fr> wrote:
> Dear all,
>
> I am looking for a tool to perform word-level alignment on a raw
> parallel corpora. That is, given two text that are translations of each
> other, output a word-level alignment output (my goal is to use this word
> alignment output to quickly build a bilingual lexicon).
>
> I found and tried many softwares, but met several difficulties :
> - most of them process already aligned documents (and require another
> tool to perform sentence alignment, which require another tool...). I
> need one that can process raw text documents
> - a lot of them are really outdated (computer-history speaking) and
> don't compile well with "modern" C, C++ or Java compiler
>
> I don't really need the best alignment software ever. I need something
> quite simple, that can be used in a fully automatic process (that means,
> no windows GUI), even if it has a "low" precision compared to best
> results obtain by researcher or industry.
>
> Do you have any clue ?
>
> Thanks,
>
> --
> Emmanuel
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list