[Corpora-List] comparable corpora and computer-aided translation

Dom Widdows widdows at google.com
Mon Nov 16 14:41:52 UTC 2009


Dear Xiaotian,

One paper on finding translations without parallel corpora is:
Learning Bilingual Lexicons from Monolingual Corpora
Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein, ACL 2008
http://www.eecs.berkeley.edu/~aria42/pubs/acl2008-unsup-bilexicon.pdf

In general I think there has been a lot of good work that uses
language models for the target language built from large monolingual
corpora. E.g., you can use a smaller parallel French-English corpus to
translate into English, and a large English-only corpus to help "clean
up" your translation to make sure your English translation is
"reasonable English", as such. At least, that's my cartoon view of the
general idea, I'm sure there are many experts out there who can enrich
or correct this summary.

Best wishes,
Dominic

On Sun, Nov 15, 2009 at 5:11 PM, Xiaotian Guo <garlickfred at gmail.com> wrote:
> Dear Corpora Colleagues
>
> The use of bilingual parallel corpora in computer-aided translation (CAT)
> has been widely acknowledged and applied now. I just wonder whether there
> has been substantial progress or achievement in using comparable corpora in
> CAT. I am aware of Belinda Maia's article "Some Languages are more Equal
> than Others: training translators in terminology and information retrieval
> using comparable and parallel corpora" in Corpora in Translator Education,
> 2003. Is there any other literature on this topic?
>
> If you have any ideas of how comparable corpora can be used in CAT (not
> necessarily mature), please share them with me.
>
> All the best
>
> Xiaotian Guo
>
> SOAS & New Vision Language Centre
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list