<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold"></span>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><b><a
href="#scholar"><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Spring 2013 LDC Data
Scholarship Program<o:p></o:p></span></a></b></p>
<b><span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></b>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><i
style="mso-bidi-font-style:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">New
publications:<o:p></o:p></span></i></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><a
href="#giga"><b><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Annotated English Gigaword<o:p></o:p></span></b></a></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><a
href="#semi"><b><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Chinese-English
Semiconductor Parallel Text<br>
</span></b></a><b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><br>
<a href="#gale"><span style="mso-bidi-font-weight:bold">GALE
Phase 2 Arabic Newswire Parallel Text</span></a></span></b></p>
<b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><span
style="mso-bidi-font-weight:bold"><span style="color:black"><o:p></o:p></span></span></span></b>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold"></span></p>
<hr size="2" width="100%">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold"></span></p>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold"> <o:p></o:p></span>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><a
name="scholar"></a><b><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">Spring 2013 LDC
Data Scholarship Program</span></b><i
style="mso-bidi-font-style:normal"><span style="font-size:
12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold"></span></i></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">Applications are now being accepted
through January 15, 2013, 11:59PM EST for the Spring 2013 LDC
Data Scholarship program! The LDC Data Scholarship program
provides university students with access to LDC data at no-cost.
During previous program cycles, LDC has awarded no-cost copies
of LDC data to over 25 individual students and student research
groups.</span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">This program is open to students
pursuing both undergraduate and graduate studies in an
accredited college or university. LDC Data Scholarships are not
restricted to any particular field of study; however, students
must demonstrate a well-developed research agenda and a bona
fide inability to pay. The selection process is highly
competitive. </span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">The application consists of two
parts: </span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">(1) Data Use Proposal. Applicants
must submit a proposal describing their intended use of the
data. The proposal should state which data the student plans to
use and how the data will benefit their research project as well
as information on the proposed methodology or algorithm.</span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">Applicants should consult the </span><a
href="http://www.ldc.upenn.edu/Catalog/index.jsp"
target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman \;color\:\#0000CC";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:
bold">LDC Corpus Catalog</span></a><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold">
for a complete list of data distributed by LDC. Due to certain
restrictions, a handful of LDC corpora are restricted to members
of the Consortium. Applicants are advised to select a maximum of
one to two datasets; students may apply for additional datasets
during the following cycle once they have completed processing
of the initial datasets and publish or present work in some
juried venue.</span><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">(2) Letter of Support. Applicants
must submit one letter of support from their thesis adviser or
department chair. The letter must verify the student's need for
data and confirm that the department or university lacks the
funding to pay the full Non-member Fee for the data or to join
the consortium.</span><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">
</span><span style="font-size:12.0pt;mso-fareast-font-family:
"Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">For further information on
application materials and program rules, please visit the </span><a
href="http://www.ldc.upenn.edu/About/scholarships.html"
target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:#0000CC;mso-bidi-font-weight:
bold">LDC Data Scholarship</span></a><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">
page. </span><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:
bold">Students can email their applications to the </span><a
href="mailto:datascholarships@ldc.upenn.edu"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman \;color\:\#0000CC";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC
Data Scholarship program</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">. Decisions will be sent by email
from the same address.</span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold">The deadline for the Spring 2013
program cycle is January 15, 2013, 11:59PM EST.</span><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span></p>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal" style="margin-bottom:12.0pt;text-align:center;
line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;
mso-bidi-font-weight:bold"><o:p> </o:p></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:
bold"><br>
<br>
<b>New publications</b></span><b
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></b></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><a name="giga"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)
</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T21"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Annotated English
Gigaword</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> was developed by </span><a
href="http://hltcoe.jhu.edu/"><span
style="font-size:12.0pt;mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Johns
Hopkins
University's Human Language Technology Center of Excellence</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. It adds
automatically-generated syntactic and discourse structure
annotation to English Gigaword Fifth Edition (</span><a
href="http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T07"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">LDC2011T07</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">) and also contains an
API and tools for reading the dataset's XML files. The goal of
the annotation is to provide a standardized corpus for knowledge
extraction and distributional semantics which enables broader
involvement in large-scale knowledge-acquisition efforts by
researchers.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Annotated
English Gigaword contains the nearly ten million documents (over
four billion words) of the original English Gigaword Fifth
Edition from seven news sources:<o:p></o:p></span></p>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Agence
France-Presse, English Service (afp_eng)<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Associated
Press Worldstream, English Service (apw_eng) <o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Central
News Agency of Taiwan, English Service (cna_eng) <o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Los
Angeles Times/Washington Post Newswire Service (ltw_eng) <o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Washington
Post/Bloomberg Newswire Service (wpb_eng)<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">New
York Times Newswire Service (nyt_eng) <o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Xinhua
News Agency, English Service (xin_eng) <o:p></o:p></span></li>
</ul>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
following layers of annotation were added:<o:p></o:p></span></p>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Tokenized
and segmented sentences<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Treebank-style
constituent parse trees<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Syntactic
dependency trees<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Named
entities<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">In-document
coreference chains<o:p></o:p></span></li>
</ul>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
annotation was performed in a three-step process: (1) the data
was preprocessed and sentences selected for annotation
(sentences with more than 100 tokens were excluded); (2)
syntactic parses were derived; and (3) the parsed output was
post-processed to derive syntactic dependencies, named entities
and coreference chains. Over 183 million sentences were parsed.
<o:p></o:p></span></p>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">*<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><a name="semi"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><span
style="mso-spacerun:yes"> </span>(2) </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T22"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Chinese-English
Semiconductor Parallel Text</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
was developed by </span><a href="http://www.mitre.org/"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">The MITRE Corporation</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. It consists of
parallel sentences from a collection of abstracts from
scientific articles on semiconductors published in Mandarin and
translated into English by translators with particular expertise
in the technical area. Translators were instructed to err on the
side of literal translation if required, but to maintain the
technical writing style of the source and to make the resulting
English as natural as possible. The translators followed
specific guidelines for translation, and those are included in
this distribution.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">There
are 2,169 lines of parallel Mandarin and English, with a total
of 125,302 characters of Mandarin and 64,851 words of English,
presented in a separate UTF-8 plain text file for each language.
The sentences were translated in sequential order and presented
in a scrambled order, such that parallel sentences at identical
line numbers are translations. For example, the 31st line of the
English file is a translation of the 31st line of the Mandarin
file. The original line sequence is not provided.<o:p></o:p></span></p>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">*<br>
</span></p>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><br>
<span style="font-size:12.0pt;mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><a name="gale"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(3)
</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T17"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">GALE Phase 2 Arabic
Newswire Parallel Text</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
was developed by LDC.<span style="mso-spacerun:yes"> </span>Along
with other corpora, the parallel text in this release comprised
training data for Phase 2 of the DARPA GALE (Global Autonomous
Language Exploitation) Program. This corpus contains Modern
Standard Arabic source text and corresponding English
translations selected from newswire data collected in 2007 by
LDC and transcribed by LDC or under its direction.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">GALE
Phase 2 Arabic Newswire Parallel Text includes 400
source-translation pairs, comprising 181,704 tokens of Arabic
source text and its English translation. Data is drawn from six
distinct Arabic newswire sources.: Al Ahram, Al Hayat, Al-Quds
Al-Arabi, An Nahar, Asharq Al-Awsat and Assabah. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
files in this release were transcribed by LDC staff and/or
transcription vendors under contract to LDC in accordance with
the </span><a
href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V3.pdf"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Quick Rich
Transcription</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> guidelines developed
by LDC. Transcribers indicated sentence boundaries in addition
to transcribing the text. Data was manually selected for
translation according to several criteria, including linguistic
features, transcription features and topic features. The
transcribed and segmented files were then reformatted into a
human-readable translation format and assigned to translation
vendors. Translators followed LDC's Arabic to English
translation guidelines. Bilingual LDC staff performed quality
control procedures on the completed translations.<o:p></o:p></span></p>
<br>
<hr size="2" width="100%">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></p>
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</body>
</html>