Arabic-L:LING:LDC GALE Phase 2 Parallel Text

Dilworth Parkinson dilworthparkinson at GMAIL.COM
Thu Nov 1 18:26:19 UTC 2012


------------------------------------------------------------------------
Arabic-L: Thu 01 Nov 2012
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
           unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:LDC GALE Phase 2 Parallel Text

-------------------------Messages-----------------------------------
1)
Date: 01 Nov 2012
From:Linguistic Data Consortium ldc at ldc.upenn.edu
Subject:LDC GALE Phase 2 Parallel Text

(2) GALE Phase 2 Arabic Broadcast News Parallel
Text<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T18>
was
developed by LDC, and along with other corpora, the parallel text in this
release comprised training data for Phase 2 of the DARPA GALE (Global
Autonomous Language Exploitation) Program. This corpus contains Modern
Standard Arabic source text and corresponding English translations selected
from broadcast news (BN) data collected by LDC between 2005 and 2007 and
transcribed by LDC or under its direction.****

GALE Phase 2 Arabic Broadcast News Parallel Text includes seven
source-translation pairs, comprising 29,210 words of Arabic source text and
its English translation. Data is drawn from six distinct Arabic programs
broadcast between 2005 and 2007 from Abu Dhabi TV, based in Abu Dhabi,
United Arab Emirates; Al Alam News Channel, based in Iran; Aljazeera, a
regional broadcast programmer based in Doha, Qatar; Dubai TV, based in
Dubai, United Arab Emirates; and Kuwait TV, a national television station
based in Kuwait. The BN programming in this release focuses on current
events topics.****

The files in this release were transcribed by LDC staff and/or
transcription vendors under contract to LDC in accordance with the Quick
Rich Transcription<http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf>
guidelines
developed by LDC. Transcribers indicated sentence boundaries in addition to
transcribing the text. Data was manually selected for translation according
to several criteria, including linguistic features, transcription features
and topic features. The transcribed and segmented files were then
reformatted into a human-readable translation format and assigned to
translation vendors. Translators followed LDC's Arabic to English
translation guidelines. Bilingual LDC staff performed quality control
procedures on the completed translations.****

--------------------------------------------------------------------------
End of Arabic-L: 01 Nov 2012
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20121101/af75214d/attachment.htm>


More information about the Arabic-l mailing list