Arabic-L:LING:Needs Test Corpus with Phrase Breaks
Dilworth Parkinson
dilworthparkinson at GMAIL.COM
Tue Apr 9 16:14:18 UTC 2013
------------------------------------------------------------------------
Arabic-L: Tue 09 Apr 2013
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:Needs Test Corpus with Phrase Breaks
-------------------------Messages-----------------------------------
1)
Date: 09 Apr 2013
From:Eric Atwell <E.S.Atwell at leeds.ac.uk>
Subject:Needs Test Corpus with Phrase Breaks
Researchers at the Universities of Leeds and Jordan are looking for a
small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA)
annotated with PHRASE BREAKS. The latter should delineate well-formed,
meaningful chunks and should not represent disfluencies. To illustrate the
kind of thing we are looking for, here is
a single MSA sentence of 48 words:
http://www.comp.leeds.ac.uk/**claireb/msaSentence.pdf<http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf>
In this example, only two words are followed by punctuation - and we
have identified these as breaks. In addition, we have also tagged a few
other words as likely boundary locations. If you know of or have such a
resource, we would love to hear from you.
Thanks,
Claire Brierley C.Brierley at leeds.ac.uk Senior Research Fellow
School of Computing, University of Leeds, UK
--------------------------------------------------------------------------
End of Arabic-L: 09 Apr 2013
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20130409/4362d2e4/attachment.htm>
More information about the Arabic-l
mailing list