<div dir="ltr">------------------------------------------------------------------------<br>Arabic-L: Fri 16 Nov 2012<br>Moderator: Dilworth Parkinson <<a href="mailto:dilworth_parkinson@byu.edu" target="_blank">dilworth_parkinson@byu.edu</a>><br>
[To post messages to the list, send them to <a href="mailto:arabic-l@byu.edu" target="_blank">arabic-l@byu.edu</a>]<br>[To unsubscribe, send message from same address you subscribed from to<br><a href="mailto:listserv@byu.edu" target="_blank">listserv@byu.edu</a> with first line reading:<br>
unsubscribe arabic-l ]<br><br>-------------------------Directory------------------------------------<br><br>1) Subject:GALE Phase 2 Arabic Newswire Parallel Text<br><br>-------------------------Messages-----------------------------------<br>
1)<br>Date: 16 Nov 2012<br>From:<span name="Linguistic Data Consortium" style="font-size:13.333333969116211px;font-family:arial,sans-serif">Linguistic Data Consortium</span><span style="font-family:arial,sans-serif;font-size:13.333333969116211px;white-space:nowrap"> </span><span style="font-family:arial,sans-serif;font-size:13.333333969116211px;white-space:nowrap"><a href="mailto:ldc@ldc.upenn.edu" target="_blank">ldc@ldc.upenn.edu</a></span><br>
Subject:GALE Phase 2 Arabic Newswire Parallel Text<br><br><p style="font-family:arial,sans-serif;font-size:13.333333969116211px">(3) <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T17" target="_blank">GALE Phase 2 Arabic Newswire Parallel Text</a> was developed by LDC. Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains Modern Standard Arabic source text and corresponding English translations selected from newswire data collected in 2007 by LDC and transcribed by LDC or under its direction.<u></u><u></u></p>
<p style="font-family:arial,sans-serif;font-size:13.333333969116211px">GALE Phase 2 Arabic Newswire Parallel Text includes 400 source-translation pairs, comprising 181,704 tokens of Arabic source text and its English translation. Data is drawn from six distinct Arabic newswire sources.: Al Ahram, Al Hayat, Al-Quds Al-Arabi, An Nahar, Asharq Al-Awsat and Assabah.<u></u><u></u></p>
<p style="font-family:arial,sans-serif;font-size:13.333333969116211px">The files in this release were transcribed by LDC staff and/or transcription vendors under contract to LDC in accordance with the<a href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf" target="_blank">Quick Rich Transcription</a> guidelines developed by LDC. Transcribers indicated sentence boundaries in addition to transcribing the text. Data was manually selected for translation according to several criteria, including linguistic features, transcription features and topic features. The transcribed and segmented files were then reformatted into a human-readable translation format and assigned to translation vendors. Translators followed LDC's Arabic to English translation guidelines. Bilingual LDC staff performed quality control procedures on the completed translations.<u></u><u></u></p>
<p style="font-family:arial,sans-serif;font-size:13.333333969116211px">GALE Phase 2 Arabic Newswire Parallel Text is distributed via web download.<u></u><u></u></p><p style="font-family:arial,sans-serif;font-size:13.333333969116211px">
2012 Subscription Members will automatically receive two copies of this data on disc. 2012 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1750.<br>
</p><p style="font-family:arial,sans-serif;font-size:13.333333969116211px"></p><hr width="100%" size="2" style="font-family:arial,sans-serif;font-size:13.333333969116211px"><br>--------------------------------------------------------------------------<br>
End of Arabic-L: 16 Nov 2012<br></div>