<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p class="MsoNormal" align="center"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><br>

        <o:p></o:p></span></p>

    <p class="MsoNormal" align="center"><a href="#scholar"><b><span

            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

            minor-latin">-  Fall 2012 LDC Data Scholarship Recipients  -<o:p></o:p></span></b></a></p>

    <span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

      minor-latin"><o:p></o:p></span>

    <p class="MsoNormal" align="center"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><a href="#wiki"><b>-  Language Resource Wiki  -</b></a><o:p></o:p></span></p>

    <p class="MsoNormal" align="center"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><i>New publications:</i><o:p></o:p></span></p>

    <p class="MsoNormal" align="center"><a href="#gale1"><b><span

            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

            minor-latin">-  GALE Chinese-English Word Alignment and

            Tagging Training Part 2 -- Newswire  -<o:p></o:p></span></b></a></p>

    <p class="MsoNormal" align="center"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><a href="#gale2"><b>-  GALE Phase 2 Arabic

            Broadcast News Parallel Text  -</b></a></span></p>

    <span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

      minor-latin"><o:p></o:p></span>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"></span></p>

    <hr size="2" width="100%">

    <p class="MsoNormal" align="center"><a name="scholar"></a><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><b>Fall 2012 LDC Data Scholarship Recipients</b></span><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><o:p></o:p></span></p>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">LDC is pleased to announce the student recipients

        of the Fall 2012 LDC Data Scholarship program!  This program

        provides university and college students with access to LDC data

        at no-cost. Students were asked to complete an application which

        consisted of a proposal describing their intended use of the

        data, as well as a letter of support from their thesis adviser.

        We received many solid applications and have chosen six <span

          style="mso-spacerun:yes"> </span>proposals to support.   The

        following students received no-cost copies of LDC data:<o:p></o:p></span></p>

    <blockquote>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Jaffar Atwan - National University of Malaysia

          (Malaysia), Phd  candidate, Information Science and

          Technology.  Jaffar has been awarded a copy of Arabic Newswire

          Part 1 (LDC2001T55) for his work in information retrieval.<br>

          <br>

          Sarath Chandar - Indian Institute of Technology, Madras

          (India), MS candidate, Computer Science and Engineering. 

          Sarath has been awarded a copy of Treebank-3 (LDC99T42) for

          his work in grammar induction.<br>

          <br>

          Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd

          Candidate, Electrical and Computer Engineering.  Kuruvachan

          has been awarded a copy of Fisher English Part 2

          (LDC2005S13/T19) a</span><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">nd</span>2008<a name="top"> NIST Speaker

          Recognition Evaluation</a><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin"> data (LDC2011S05/07/08/11) for his work in

          speaker recognition.<o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Eduardo Motta - Pontifícia Universidade Católica

          do Rio de Janeiro (Brazil), Phd candidate, Information

          Sciences.  Eduardo has been awarded a copy of English Web

          Treebank (LDC2012T13) for his work in machine learning.<o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Genevieve Sapijaszko - University of Central

          Florida (USA), Phd Candidate, Electrical and Computer

          Engineering.<span style="mso-spacerun:yes">  </span>Genevieve

          has been awarded a copy TIMIT Acoustic-Phonetic Continuous

          Speech Corpus (LDC93S1) and YOHO Speaker Verification

          (LDC94S16) for her work in digital signal processing.<br>

          <br>

          John Steinberg - Temple University (USA), MS

          candidate, Electrical and Computer Engineering.  John has been

          awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15)

          and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his

          work in speech recognition.<o:p></o:p></span></p>

    </blockquote>

    <span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

      minor-latin"><br>

      <o:p></o:p></span>

    <p class="MsoNormal" align="center"><a name="wiki"></a><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><b>Language Resource Wiki</b><o:p></o:p></span></p>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin"><o:p></o:p></span>The <a

        href="http://lrwiki.ldc.upenn.edu/">Language Resource Wiki</a>

      catalogs data, software, descriptive grammars and other resources

      for a variety of languages but especially those with a paucity of

      generally available resources for research. LDC is actively

      seeking editors knowledgeable in these and other languages to

      develop and maintain the pages, which are readable by anyone but

      writable only by editors. The wiki currently has resource listings

      for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian,

      Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian,

      Tagalog, Tamil, and Urdu, and for the following Sign Languages:

      American, British, Catalan, Dutch, Flemish, German, Japanese, New

      Zealand, Polish, Spanish, and Swiss German. <span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>

        <br>

        <o:p></o:p></span></p>

    <p class="MsoNormal" align="center"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p><b>New

            publications</b><br>

        </o:p></span></p>

    <p class="MsoNormal"><a name="gale1"></a><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">(1) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T18">GALE

Chinese-English

          Word Alignment and Tagging Training Part 2 -- Newswire</a> was

        developed by LDC and contains 169,080 tokens of word aligned

        Chinese and English parallel text enriched with linguistic tags.

        This material was used as training data in the <a

          href="http://projects.ldc.upenn.edu/gale/index.html">DARPA

          GALE</a> (Global Autonomous Language Exploitation) program. <o:p></o:p></span></p>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">Some approaches to statistical machine translation

        include the incorporation of linguistic knowledge in word

        aligned text as a means to improve automatic word alignment and

        machine translation quality. This is accomplished with two

        annotation schemes: alignment and tagging. Alignment identifies

        minimum translation units and translation relations by using

        minimum-match and attachment annotation approaches. A set of

        word tags and alignment link tags are designed in the tagging

        scheme to describe these translation units and relations.

        Tagging adds contextual, syntactic and language-specific

        features to the alignment annotation. <o:p></o:p></span></p>

    <blockquote>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">The Chinese word alignment tasks consisted of the

          following components: <o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Identifying, aligning, and tagging 8 different

          types of links<o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Identifying, attaching, and tagging local-level

          unmatched words<o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Identifying and tagging sentence/discourse-level

          unmatched words<o:p></o:p></span></p>

      <p class="MsoNormal"><span

          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

          minor-latin">Identifying and tagging all instances of Chinese

        </span><span style="font-family:"MS

          Gothic";mso-ascii-font-family:Calibri;mso-ascii-theme-font:

minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">的</span><span

style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(DE)

except

          when they were a part of a semantic link.<o:p></o:p></span></p>

    </blockquote>

    <br>

    <p class="MsoNormal" align="center">*<br>

      <br>

    </p>

    <p class="MsoNormal"><a name="gale2"></a><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">(2) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T18">GALE

Phase

          2 Arabic Broadcast News Parallel Text</a> was developed by

        LDC, and along with other corpora, the parallel text in this

        release comprised training data for Phase 2 of the DARPA GALE

        (Global Autonomous Language Exploitation) Program. This corpus

        contains Modern Standard Arabic source text and corresponding

        English translations selected from broadcast news (BN) data

        collected by LDC between 2005 and 2007 and transcribed by LDC or

        under its direction.<o:p></o:p></span></p>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">GALE Phase 2 Arabic Broadcast News Parallel Text

        includes seven source-translation pairs, comprising 29,210 words

        of Arabic source text and its English translation. Data is drawn

        from six distinct Arabic programs broadcast between 2005 and

        2007 from Abu Dhabi TV, based in Abu Dhabi, United Arab

        Emirates; Al Alam News Channel, based in Iran; Aljazeera, a

        regional broadcast programmer based in Doha, Qatar; Dubai TV,

        based in Dubai, United Arab Emirates; and Kuwait TV, a national

        television station based in Kuwait. The BN programming in this

        release focuses on current events topics. <o:p></o:p></span></p>

    <p class="MsoNormal"><span

        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:

        minor-latin">The files in this release were transcribed by LDC

        staff and/or transcription vendors under contract to LDC in

        accordance with the <a

href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf">Quick

Rich

          Transcription</a> guidelines developed by LDC. Transcribers

        indicated sentence boundaries in addition to transcribing the

        text. Data was manually selected for translation according to

        several criteria, including linguistic features, transcription

        features and topic features. The transcribed and segmented files

        were then reformatted into a human-readable translation format

        and assigned to translation vendors. Translators followed LDC's

        Arabic to English translation guidelines. Bilingual LDC staff

        performed quality control procedures on the completed

        translations.<o:p></o:p></span></p>

    <hr size="2" width="100%"><span class="moz-txt-tag"></span><br>

    <br>

    <pre class="moz-signature" cols="72">-- 

--

Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>