<div dir="ltr">------------------------------------------------------------------------<br>Arabic-L: Tue 09 Apr 2013<br>Moderator: Dilworth Parkinson <<a href="mailto:dilworth_parkinson@byu.edu" target="_blank">dilworth_parkinson@byu.edu</a>><br>


[To post messages to the list, send them to <a href="mailto:arabic-l@byu.edu" target="_blank">arabic-l@byu.edu</a>]<br>[To unsubscribe, send message from same address you subscribed from to<br><a href="mailto:listserv@byu.edu" target="_blank">listserv@byu.edu</a> with first line reading:<br>


           unsubscribe arabic-l                                      ]<br><br>-------------------------Directory------------------------------------<br><br>1) Subject:Needs Test Corpus with Phrase Breaks<br><br>-------------------------Messages-----------------------------------<br>


1)<br>Date: 09 Apr 2013<br>From:<span style="font-family:arial,sans-serif;font-size:13px">Eric Atwell <<a href="mailto:E.S.Atwell@leeds.ac.uk" target="_blank">E.S.Atwell@leeds.ac.uk</a>></span><br>Subject:Needs Test Corpus with Phrase Breaks<br>

<br><span style="font-family:arial,sans-serif;font-size:13px">Researchers at the Universities of Leeds and Jordan are looking for a</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA) annotated with PHRASE BREAKS. The latter should delineate well-formed, meaningful chunks and should not represent disfluencies. To illustrate the kind of thing we are looking for, here is</span><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">a single MSA sentence of 48 words:</span><br style="font-family:arial,sans-serif;font-size:13px"><a href="http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf" style="font-family:arial,sans-serif;font-size:13px" target="_blank">http://www.comp.leeds.ac.uk/<u></u>claireb/msaSentence.pdf</a><br style="font-family:arial,sans-serif;font-size:13px">

<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">In this example, only two words are followed by punctuation - and we</span><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">have identified these as breaks. In addition, we have also tagged a few other words as likely boundary locations. If you know of or have such a resource, we would love to hear from you.</span><br style="font-family:arial,sans-serif;font-size:13px">

<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">Thanks,</span><br style="font-family:arial,sans-serif;font-size:13px"><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">Claire Brierley </span><a href="mailto:C.Brierley@leeds.ac.uk" style="font-family:arial,sans-serif;font-size:13px" target="_blank">C.Brierley@leeds.ac.uk</a><span style="font-family:arial,sans-serif;font-size:13px"> Senior Research Fellow</span><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">School of Computing, University of Leeds, UK</span><br style="font-family:arial,sans-serif;font-size:13px"><br><div>--------------------------------------------------------------------------<br>

End of Arabic-L: 09 Apr 2013<br></div></div>