<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <span style="font-size:12.0pt;mso-fareast-font-family:"Times
      New Roman";
      mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;




      mso-bidi-font-weight:bold"></span>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><b><a
          href="#scholar"><span style="font-size:12.0pt;
            mso-fareast-font-family:"Times New
            Roman";mso-bidi-font-family:Calibri;
            mso-bidi-theme-font:minor-latin">Spring 2013 LDC Data
            Scholarship Program<o:p></o:p></span></a></b></p>
    <b><span style="font-size:12.0pt;mso-fareast-font-family:"Times
        New Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></b>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><i
        style="mso-bidi-font-style:normal"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">New








          publications:<o:p></o:p></span></i></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><a
        href="#giga"><b><span style="font-size:12.0pt;
            mso-fareast-font-family:"Times New
            Roman";mso-bidi-font-family:Calibri;
            mso-bidi-theme-font:minor-latin">Annotated English Gigaword<o:p></o:p></span></b></a></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><a
        href="#semi"><b><span style="font-size:12.0pt;
            mso-fareast-font-family:"Times New
            Roman";mso-bidi-font-family:Calibri;
            mso-bidi-theme-font:minor-latin">Chinese-English
            Semiconductor Parallel Text<br>
          </span></b></a><b style="mso-bidi-font-weight:normal"><span
          style="font-size:12.0pt; mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin"><br>
          <a href="#gale"><span style="mso-bidi-font-weight:bold">GALE
              Phase 2 Arabic Newswire Parallel Text</span></a></span></b></p>
    <b style="mso-bidi-font-weight:normal"><span
        style="font-size:12.0pt; mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><span
          style="mso-bidi-font-weight:bold"><span style="color:black"><o:p></o:p></span></span></span></b>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;




        mso-bidi-font-weight:bold"></span></p>
    <hr size="2" width="100%">
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;




        mso-bidi-font-weight:bold"></span></p>
    <span style="font-size:12.0pt;mso-fareast-font-family:"Times
      New Roman";
      mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;



      mso-bidi-font-weight:bold"> <o:p></o:p></span>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><a
        name="scholar"></a><b><span style="font-size:12.0pt;
          mso-fareast-font-family:"Times New
          Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin;color:black">Spring 2013 LDC
          Data Scholarship Program</span></b><i
        style="mso-bidi-font-style:normal"><span style="font-size:
          12.0pt;mso-fareast-font-family:"Times New
          Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold"></span></i></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">Applications are now being accepted
        through January 15, 2013, 11:59PM EST for the Spring 2013 LDC
        Data Scholarship program! The LDC Data Scholarship program
        provides university students with access to LDC data at no-cost.
        During previous program cycles, LDC has awarded no-cost copies
        of LDC data to over 25 individual students and student research
        groups.</span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">This program is open to students
        pursuing both undergraduate and graduate studies in an
        accredited college or university. LDC Data Scholarships are not
        restricted to any particular field of study; however, students
        must demonstrate a well-developed research agenda and a bona
        fide inability to pay. The selection process is highly
        competitive. </span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">The application consists of two
        parts: </span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">(1) Data Use Proposal. Applicants
        must submit a proposal describing their intended use of the
        data. The proposal should state which data the student plans to
        use and how the data will benefit their research project as well
        as information on the proposed methodology or algorithm.</span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">Applicants should consult the </span><a
        href="http://www.ldc.upenn.edu/Catalog/index.jsp"
        target="_blank"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman \;color\:\#0000CC";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:


          bold">LDC Corpus Catalog</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold">
        for a complete list of data distributed by LDC. Due to certain
        restrictions, a handful of LDC corpora are restricted to members
        of the Consortium. Applicants are advised to select a maximum of
        one to two datasets; students may apply for additional datasets
        during the following cycle once they have completed processing
        of the initial datasets and publish or present work in some
        juried venue.</span><span style="font-size:12.0pt;
        mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">(2) Letter of Support. Applicants
        must submit one letter of support from their thesis adviser or
        department chair. The letter must verify the student's need for
        data and confirm that the department or university lacks the
        funding to pay the full Non-member Fee for the data or to join
        the consortium.</span><span
        style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">
      </span><span style="font-size:12.0pt;mso-fareast-font-family:
        "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">For further information on
        application materials and program rules, please visit the </span><a
        href="http://www.ldc.upenn.edu/About/scholarships.html"
        target="_blank"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin;color:#0000CC;mso-bidi-font-weight:


          bold">LDC Data Scholarship</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">
        page. </span><span style="font-size:12.0pt;
        mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:


        bold">Students can email their applications to the </span><a
        href="mailto:datascholarships@ldc.upenn.edu"><span
          style="font-size:12.0pt; mso-fareast-font-family:"Times
          New Roman \;color\:\#0000CC";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC
          Data Scholarship program</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">. Decisions will be sent by email
        from the same address.</span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold">The deadline for the Spring 2013
        program cycle is January 15, 2013, 11:59PM EST.</span><span
        style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span></p>
    <span style="font-size:12.0pt;mso-fareast-font-family:"Times
      New Roman";
      mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
    <p class="MsoNormal" style="margin-bottom:12.0pt;text-align:center;
      line-height:normal" align="center"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;


        mso-bidi-font-weight:bold"><o:p> </o:p></span><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:


        bold"><br>
        <br>
        <b>New publications</b></span><b
        style="mso-bidi-font-weight:normal"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></b></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><a name="giga"></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)


      </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T21"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin">Annotated English
          Gigaword</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"> was developed by </span><a
        href="http://hltcoe.jhu.edu/"><span
          style="font-size:12.0pt;mso-fareast-font-family: "Times
          New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Johns
Hopkins


          University's Human Language Technology Center of Excellence</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">. It adds
        automatically-generated syntactic and discourse structure
        annotation to English Gigaword Fifth Edition (</span><a
href="http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T07"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin">LDC2011T07</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">) and also contains an
        API and tools for reading the dataset's XML files. The goal of
        the annotation is to provide a standardized corpus for knowledge
        extraction and distributional semantics which enables broader
        involvement in large-scale knowledge-acquisition efforts by
        researchers.<o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Annotated


        English Gigaword contains the nearly ten million documents (over
        four billion words) of the original English Gigaword Fifth
        Edition from seven news sources:<o:p></o:p></span></p>
    <ul type="disc">
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Agence



          France-Presse, English Service (afp_eng)<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Associated



          Press Worldstream, English Service (apw_eng) <o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Central


          News Agency of Taiwan, English Service (cna_eng) <o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Los


          Angeles Times/Washington Post Newswire Service (ltw_eng) <o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Washington



          Post/Bloomberg Newswire Service (wpb_eng)<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">New


          York Times Newswire Service (nyt_eng) <o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Xinhua


          News Agency, English Service (xin_eng) <o:p></o:p></span></li>
    </ul>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The


        following layers of annotation were added:<o:p></o:p></span></p>
    <ul type="disc">
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Tokenized



          and segmented sentences<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Treebank-style



          constituent parse trees<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Syntactic



          dependency trees<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Named



          entities<o:p></o:p></span></li>
      <li class="MsoNormal"
        style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
        line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";
          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">In-document



          coreference chains<o:p></o:p></span></li>
    </ul>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The


        annotation was performed in a three-step process: (1) the data
        was preprocessed and sentences selected for annotation
        (sentences with more than 100 tokens were excluded); (2)
        syntactic parses were derived; and (3) the parsed output was
        post-processed to derive syntactic dependencies, named entities
        and coreference chains. Over 183 million sentences were parsed.
        <o:p></o:p></span></p>
    <span style="font-size:12.0pt;mso-fareast-font-family:"Times
      New Roman";mso-bidi-font-family:
      Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
      auto;text-align:center;line-height:normal" align="center"><span
        style="font-size:12.0pt; mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin">*<o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><a name="semi"></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><span
          style="mso-spacerun:yes"> </span>(2) </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T22"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin">Chinese-English
          Semiconductor Parallel Text</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
        was developed by </span><a href="http://www.mitre.org/"><span
          style="font-size:12.0pt; mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin">The MITRE Corporation</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">. It consists of
        parallel sentences from a collection of abstracts from
        scientific articles on semiconductors published in Mandarin and
        translated into English by translators with particular expertise
        in the technical area. Translators were instructed to err on the
        side of literal translation if required, but to maintain the
        technical writing style of the source and to make the resulting
        English as natural as possible. The translators followed
        specific guidelines for translation, and those are included in
        this distribution.<o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">There


        are 2,169 lines of parallel Mandarin and English, with a total
        of 125,302 characters of Mandarin and 64,851 words of English,
        presented in a separate UTF-8 plain text file for each language.
        The sentences were translated in sequential order and presented
        in a scrambled order, such that parallel sentences at identical
        line numbers are translations. For example, the 31st line of the
        English file is a translation of the 31st line of the Mandarin
        file. The original line sequence is not provided.<o:p></o:p></span></p>
    <span style="font-size:12.0pt;mso-fareast-font-family:"Times
      New Roman";
      mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
    <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
      text-align:center;line-height:normal" align="center"><span
        style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">*<br>
      </span></p>
    <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
      text-align:center;line-height:normal" align="center"><br>
      <span style="font-size:12.0pt;mso-fareast-font-family: "Times
        New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><a name="gale"></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(3)


      </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T17"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin">GALE Phase 2 Arabic
          Newswire Parallel Text</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
        was developed by LDC.<span style="mso-spacerun:yes">  </span>Along

        with other corpora, the parallel text in this release comprised
        training data for Phase 2 of the DARPA GALE (Global Autonomous
        Language Exploitation) Program. This corpus contains Modern
        Standard Arabic source text and corresponding English
        translations selected from newswire data collected in 2007 by
        LDC and transcribed by LDC or under its direction.<o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">GALE


        Phase 2 Arabic Newswire Parallel Text includes 400
        source-translation pairs, comprising 181,704 tokens of Arabic
        source text and its English translation. Data is drawn from six
        distinct Arabic newswire sources.: Al Ahram, Al Hayat, Al-Quds
        Al-Arabi, An Nahar, Asharq Al-Awsat and Assabah. <o:p></o:p></span></p>
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The


        files in this release were transcribed by LDC staff and/or
        transcription vendors under contract to LDC in accordance with
        the </span><a
href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf"><span
          style="font-size:12.0pt;mso-fareast-font-family:"Times
          New Roman";mso-bidi-font-family:
          Calibri;mso-bidi-theme-font:minor-latin">Quick Rich
          Transcription</span></a><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"> guidelines developed
        by LDC. Transcribers indicated sentence boundaries in addition
        to transcribing the text. Data was manually selected for
        translation according to several criteria, including linguistic
        features, transcription features and topic features. The
        transcribed and segmented files were then reformatted into a
        human-readable translation format and assigned to translation
        vendors. Translators followed LDC's Arabic to English
        translation guidelines. Bilingual LDC staff performed quality
        control procedures on the completed translations.<o:p></o:p></span></p>
    <br>
    <hr size="2" width="100%">
    <p class="MsoNormal"
      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
      line-height:normal"><span
        style="font-size:12.0pt;mso-fareast-font-family:"Times New
        Roman";
        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></p>
    <pre class="moz-signature" cols="72">-- 
--

Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium                  Phone: 1 (215) 573-1275
University of Pennsylvania                    Fax: 1 (215) 573-2175
3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>


</pre>
  </body>
</html>