[Corpora-List] Arabic transcript marked with pauses
Claire Brierley
C.Brierley at leeds.ac.uk
Mon Apr 8 10:00:50 UTC 2013
Researchers at the Universities of Leeds and Jordan are looking for a small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA) annotated with phrase breaks. The latter should delineate well-formed, meaningful chunks and should not represent disfluencies. To illustrate the kind of thing we are looking for, we have uploaded a single MSA sentence of 48 words: http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf
In this example, only two words are followed by punctuation - and we have identified these as breaks. In addition, we have also tagged a few other words as likely boundary locations. If you know of/have such a resource, we would love to hear from you.
Thanks,
Claire Brierley
Senior Research Fellow
Computing
University of Leeds, UK
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list