<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p class="MsoNormal" align="center"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><br>
<o:p></o:p></span></p>
<p class="MsoNormal" align="center"><a href="#scholar"><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- Fall 2012 LDC Data Scholarship Recipients -<o:p></o:p></span></b></a></p>
<span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span>
<p class="MsoNormal" align="center"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><a href="#wiki"><b>- Language Resource Wiki -</b></a><o:p></o:p></span></p>
<p class="MsoNormal" align="center"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><i>New publications:</i><o:p></o:p></span></p>
<p class="MsoNormal" align="center"><a href="#gale1"><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- GALE Chinese-English Word Alignment and
Tagging Training Part 2 -- Newswire -<o:p></o:p></span></b></a></p>
<p class="MsoNormal" align="center"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><a href="#gale2"><b>- GALE Phase 2 Arabic
Broadcast News Parallel Text -</b></a></span></p>
<span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"></span></p>
<hr size="2" width="100%">
<p class="MsoNormal" align="center"><a name="scholar"></a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><b>Fall 2012 LDC Data Scholarship Recipients</b></span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">LDC is pleased to announce the student recipients
of the Fall 2012 LDC Data Scholarship program! This program
provides university and college students with access to LDC data
at no-cost. Students were asked to complete an application which
consisted of a proposal describing their intended use of the
data, as well as a letter of support from their thesis adviser.
We received many solid applications and have chosen six <span
style="mso-spacerun:yes"> </span>proposals to support. The
following students received no-cost copies of LDC data:<o:p></o:p></span></p>
<blockquote>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Jaffar Atwan - National University of Malaysia
(Malaysia), Phd candidate, Information Science and
Technology. Jaffar has been awarded a copy of Arabic Newswire
Part 1 (LDC2001T55) for his work in information retrieval.<br>
<br>
Sarath Chandar - Indian Institute of Technology, Madras
(India), MS candidate, Computer Science and Engineering.
Sarath has been awarded a copy of Treebank-3 (LDC99T42) for
his work in grammar induction.<br>
<br>
Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd
Candidate, Electrical and Computer Engineering. Kuruvachan
has been awarded a copy of Fisher English Part 2
(LDC2005S13/T19) a</span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">nd</span>2008<a name="top"> NIST Speaker
Recognition Evaluation</a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> data (LDC2011S05/07/08/11) for his work in
speaker recognition.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Eduardo Motta - Pontifícia Universidade Católica
do Rio de Janeiro (Brazil), Phd candidate, Information
Sciences. Eduardo has been awarded a copy of English Web
Treebank (LDC2012T13) for his work in machine learning.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Genevieve Sapijaszko - University of Central
Florida (USA), Phd Candidate, Electrical and Computer
Engineering.<span style="mso-spacerun:yes"> </span>Genevieve
has been awarded a copy TIMIT Acoustic-Phonetic Continuous
Speech Corpus (LDC93S1) and YOHO Speaker Verification
(LDC94S16) for her work in digital signal processing.<br>
<br>
John Steinberg - Temple University (USA), MS
candidate, Electrical and Computer Engineering. John has been
awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15)
and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his
work in speech recognition.<o:p></o:p></span></p>
</blockquote>
<span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><br>
<o:p></o:p></span>
<p class="MsoNormal" align="center"><a name="wiki"></a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><b>Language Resource Wiki</b><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span>The <a
href="http://lrwiki.ldc.upenn.edu/">Language Resource Wiki</a>
catalogs data, software, descriptive grammars and other resources
for a variety of languages but especially those with a paucity of
generally available resources for research. LDC is actively
seeking editors knowledgeable in these and other languages to
develop and maintain the pages, which are readable by anyone but
writable only by editors. The wiki currently has resource listings
for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian,
Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian,
Tagalog, Tamil, and Urdu, and for the following Sign Languages:
American, British, Catalan, Dutch, Flemish, German, Japanese, New
Zealand, Polish, Spanish, and Swiss German. <span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
<br>
<o:p></o:p></span></p>
<p class="MsoNormal" align="center"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p><b>New
publications</b><br>
</o:p></span></p>
<p class="MsoNormal"><a name="gale1"></a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">(1) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T18">GALE
Chinese-English
Word Alignment and Tagging Training Part 2 -- Newswire</a> was
developed by LDC and contains 169,080 tokens of word aligned
Chinese and English parallel text enriched with linguistic tags.
This material was used as training data in the <a
href="http://projects.ldc.upenn.edu/gale/index.html">DARPA
GALE</a> (Global Autonomous Language Exploitation) program. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Some approaches to statistical machine translation
include the incorporation of linguistic knowledge in word
aligned text as a means to improve automatic word alignment and
machine translation quality. This is accomplished with two
annotation schemes: alignment and tagging. Alignment identifies
minimum translation units and translation relations by using
minimum-match and attachment annotation approaches. A set of
word tags and alignment link tags are designed in the tagging
scheme to describe these translation units and relations.
Tagging adds contextual, syntactic and language-specific
features to the alignment annotation. <o:p></o:p></span></p>
<blockquote>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">The Chinese word alignment tasks consisted of the
following components: <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Identifying, aligning, and tagging 8 different
types of links<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Identifying, attaching, and tagging local-level
unmatched words<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Identifying and tagging sentence/discourse-level
unmatched words<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Identifying and tagging all instances of Chinese
</span><span style="font-family:"MS
Gothic";mso-ascii-font-family:Calibri;mso-ascii-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">的</span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(DE)
except
when they were a part of a semantic link.<o:p></o:p></span></p>
</blockquote>
<br>
<p class="MsoNormal" align="center">*<br>
<br>
</p>
<p class="MsoNormal"><a name="gale2"></a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T18">GALE
Phase
2 Arabic Broadcast News Parallel Text</a> was developed by
LDC, and along with other corpora, the parallel text in this
release comprised training data for Phase 2 of the DARPA GALE
(Global Autonomous Language Exploitation) Program. This corpus
contains Modern Standard Arabic source text and corresponding
English translations selected from broadcast news (BN) data
collected by LDC between 2005 and 2007 and transcribed by LDC or
under its direction.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">GALE Phase 2 Arabic Broadcast News Parallel Text
includes seven source-translation pairs, comprising 29,210 words
of Arabic source text and its English translation. Data is drawn
from six distinct Arabic programs broadcast between 2005 and
2007 from Abu Dhabi TV, based in Abu Dhabi, United Arab
Emirates; Al Alam News Channel, based in Iran; Aljazeera, a
regional broadcast programmer based in Doha, Qatar; Dubai TV,
based in Dubai, United Arab Emirates; and Kuwait TV, a national
television station based in Kuwait. The BN programming in this
release focuses on current events topics. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">The files in this release were transcribed by LDC
staff and/or transcription vendors under contract to LDC in
accordance with the <a
href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf">Quick
Rich
Transcription</a> guidelines developed by LDC. Transcribers
indicated sentence boundaries in addition to transcribing the
text. Data was manually selected for translation according to
several criteria, including linguistic features, transcription
features and topic features. The transcribed and segmented files
were then reformatted into a human-readable translation format
and assigned to translation vendors. Translators followed LDC's
Arabic to English translation guidelines. Bilingual LDC staff
performed quality control procedures on the completed
translations.<o:p></o:p></span></p>
<hr size="2" width="100%"><span class="moz-txt-tag"></span><br>
<br>
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>