[Corpora-List] Arabic transcript marked with pauses

Claire Brierley C.Brierley at leeds.ac.uk
Mon Apr 8 10:00:50 UTC 2013


Researchers at the Universities of Leeds and Jordan are looking for a small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA) annotated with phrase breaks. The latter should delineate well-formed, meaningful chunks and should not represent disfluencies. To illustrate the kind of thing we are looking for, we have uploaded a single MSA sentence of 48 words: http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf

In this example, only two words are followed by punctuation - and we have identified these as breaks. In addition, we have also tagged a few other words as likely boundary locations. If you know of/have such a resource, we would love to hear from you.

Thanks,

Claire Brierley
Senior Research Fellow
Computing
University of Leeds, UK
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list