[Corpora-List] comparable corpora and computer-aided translation: a summary

Xiaotian Guo garlickfred at gmail.com
Sat Jan 9 22:48:14 UTC 2010


Dear Corpora Colleagues

Happy New Year!

Some time ago I posted a query "comparable corpora and computer-aided
translation" to ask about any progress of the application of comparable
corpora in computer-aided translation and possible readings. Here is a late
summary of the replies. I would like to thank all of the colleagues below
for their contributions.

 All the best

Xiaotian Guo
SOAS & New Vision Language Centre

-----------------------------------------------------

1. *Gill Philip* recommends an article of hers : Gill Philip (2009) Arriving
at equivalence: Making a case for comparable general reference corpora in
translation studies. In Allison Beeby, Patricia Rodríguez Inés & Pilar
Sánchez-Gijón (eds) Corpus Use and Translating: Corpus use for learning to
translate and learning corpus use to translate pp59-73. Amsterdam /
Philadelphia: John Benjamins



2. *Paul Rayson* replies as follows:



You should have a look at the output from the ASSIST project involving
Lancaster and Leeds. Papers are available from:

http://ucrel.lancs.ac.uk/projects/assist/

http://www.comp.leeds.ac.uk/ssharoff/



 3. *Dominic Widdows* stresses the usefulness of comparable corpora, along
with a paper as follows:



One paper on finding translations without parallel corpora is:

Learning Bilingual Lexicons from Monolingual Corpora Aria Haghighi, Percy
Liang, Taylor Berg-Kirkpatrick and Dan Klein, ACL 2008

http://www.eecs.berkeley.edu/~aria42/pubs/acl2008-unsup-bilexicon.pdf



In general I think there has been a lot of good work that uses language
models for the target language built from large monolingual corpora. E.g.,
you can use a smaller parallel French-English corpus to translate into
English, and a large English-only corpus to help "clean up" your translation
to make sure your English translation is "reasonable English", as such. At
least, that's my cartoon view of the general idea, I'm sure there are many
experts out there who can enrich or correct this summary.



 4. *Nitin Madnani* enriches the list of readings as follows:



You may also look at the following papers/resources on leveraging comparable
data for SMT:

(a) Language and Translation Model Adaptation using Comparable Corpora
Matthew Snover, Bonnie J. Dorr, and Richard Schwartz. EMNLP 2008

(b) Dragos Stefan Munteanu and Daniel Marcu. 2005. Improving machine
translation performance by exploiting non-parallel corpora. Computational
Linguis- tics, 31(4):477–504.

(c) The proceedings for the workshop on Building and Using Comparable
Corpora (http://comparable2009.ust.hk/). There have been two so far, I
believe.



 5. *Yannick Versley*, recommends a paper from the perspetive of
computational linguistics:



This is also a bit on the computational side (rather than applied corpus
linguistics), but it may be interesting: Pekar V., Mitkov R., Blagoev D.,
and Mulloni A. (2007). Finding Translations for Low-Frequency Words in
Comparable Corpora. In Proceedings of the CONTEXT-07 Workshop on "Contextual
Information in Semantic Space Models" (CoSMo-2007). Roskille, Denmark.
pp.17-25. http://home.wlv.ac.uk/~in8113/papers/cosmo07_pekar_et_al.pdf



 6. *Stella Tagnin* mentions two papers (one written in Portuguese) as
follows:



British vs. American English, Brazilian vs. European Portuguese: how close
or how far apart? - a corpus-driven study (Frankfurt am Main: Lodz Studies
in Language 9, 2004, p. 193-208)

Stella E. O. Tagnin & Elisa Duarte Teixeira (
http://www.fflch.usp.br/dlm/comet/artigos/BRITISH%20VS.%20AMERICAN%20ENGLISH.pdf
)



A identificação de equivalentes tradutórios em corpora comparáveis (Anais do
I Congresso Internacional da ABRAPUI: Belo Horizonte, 3 a 6 de junho de
2007)

Stella E. O. Tagnin
(http://www.fflch.usp.br/dlm/comet/Novo/Stella_Abrapui%202007_artigo.pdf)

------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100109/c1c09461/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list