[Corpora-List] [CORPORA-List] RE: Evaluating Sentence Aligners

Ruprecht von Waldenfels Rvwfels at gmx.de
Thu Oct 4 08:57:34 UTC 2007


Hello, 

I've done a comparison of two aligners (hunalign and bsa) that you might find interesting. I examined different language pairs, POlish, Russian and German, and investigated how prior lemmatization can improve alignment quality. The bibliographic reference is: R.v.Waldenfels, Compiling a parallel corpus of Slavic languages. In: Brehmer, B., Ždanova, V., Zimny, R. (Hrsg.) 2006. Beiträge der Europäischen Slavistischen Linguistik (POLYSLAV) 9. München, 123-138.

It can be downloaded from my web site at 
http://www-nw.uni-regensburg.de/%7E.war05297.slavistik.sprachlit.uni-regensburg.de/

All the best, 
Ruprecht v. Waldenfels



Ruprecht v. Waldenfels, M.A.
Institut für Slavistik, Universität Regensburg
Universitätsstr. 31, 93051 Regensburg
ruprecht dot waldenfels at sprachlit.uni-regensburg.de
skype: rvwaldenfels
Tel. +49 (0) 941 943 3399
Fax. +49 (0) 941 943 1991


<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


Hello CORPORA List,

I'd be interested to hear from colleagues who are evaluating/have evaluated automatic sentence alignment in parallel corpora. I'm especially interested in work with "distant" languages (e.g., Ar-En, Zh-En). I'm thinking some of the methods you have used would reflect Arcade 2 (Chiao, Kraif, et al., 2006), (Rosen, 2005) or (Singh & Husain 2005). However, it may be you're only working in a specific domain, or at a small scale (comparing less than 5 aligners). I'd be curious to hear about your experiences, since I've been testing sentence aligners on text in a government/foreign affairs domain. I'd also welcome suggestions from anybody who has tried to incorporate usability testing into an evaluation of automatic sentence alignment: for example, you may have monitored how much manual correction users had to do after the alignment.

Thanks,

Eric Garbin

Computational Linguist

The Technology Development Group

www.thetdgroup.com

571-262-2693 
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list