<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p class="MsoNormal" style="margin-bottom:12.0pt;line-height:normal"
align="center"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt;line-height:normal"
align="center"><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><a href="#scholar"><b>- Spring
2013 LDC Data Scholarship Program -</b></a></span><a
href="#scholar"><br>
</a> </p>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><a
href="#pdtb"><b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">- Penn Discourse
Treebank Version 2.0 Update -</span></b></a><b
style="mso-bidi-font-weight:normal"><br>
</b><i><span style="font-size:12.0pt;mso-fareast-font-family:
"Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
New publications:</span></i></p>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><a
href="#gale"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
GALE Chinese-English Word Alignment and Tagging Training
Part 3 -- Web -<br>
</span></b></a></p>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><a
href="#russian"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
<b>Russian-English Computer Security Parallel Text</b> -</span></a></p>
<p class="MsoNormal" style="margin-bottom:12.0pt;line-height:normal"
align="center"></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
</p>
<br>
<b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"></span></b>
<hr size="2" width="100%">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> </span><a name="scholar"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>Spring
2013 LDC Data Scholarship Program </b></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
deadline for the Spring 2013 LDC Data Scholarship Program is one
month away! Student applications are being accepted now
through January 15, 2013<span
style="color:black;mso-bidi-font-weight:bold">, 11:59PM EST</span>.
The LDC Data Scholarship program provides university students
with access to LDC data at no cost. This program is open to
students pursuing both undergraduate and graduate studies in an
accredited college or university. LDC Data Scholarships are not
restricted to any particular field of study; however, students
must demonstrate a well-developed research agenda and a bona
fide inability to pay. <br>
<br>
Students will need to complete an application which consists of
a data use proposal and letter of support from their adviser.
For further information on application materials and program
rules, please visit the </span><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/About/scholarships.html"><span
style="mso-fareast-font-family: "Times New Roman"">LDC
Data Scholarship</span></a></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> page. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Students
can email their applications to the </span><span
style="font-size:12.0pt;
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="mailto:datascholarships@ldc.upenn.edu"><span
style="mso-fareast-font-family: "Times New Roman"">LDC
Data Scholarship program</span></a></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. Decisions will be
sent by email from the same address.</span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><br>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><a
name="pdtb"></a><b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Penn Discourse
Treebank Version 2.0 Update</span></b></p>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><br>
<b style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></b></p>
<small><span
style="font-family:"Calibri","sans-serif";mso-ascii-theme-font:minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-theme-font:minor-latin">The
developers
of the Penn Discourse Treebank Version 2.0 <a
href="http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2008T05">LDC2008T05</a>
(PDTB) have updated this release to add metadata to the Wall
Street Journal (WSJ) news stories in the corpus. The goal is to
aid understanding PDTB files as texts and to support
distinguishing texts from different genres within the WSJ. <o:p></o:p></span>
<br>
<span
style="font-family:"Calibri","sans-serif";mso-ascii-theme-font:minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-theme-font:minor-latin">The
metadata includes the following fields: <o:p></o:p></span></small>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">DD: the date the article appeared in the WSJ <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">AN: unique identifier for the article <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">HL: the column name (for regular features such
as Who's News, Marketing & Media, Technology), its
headline and by-line <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">SO: the source of the article <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">IN: manually-assigned codes or keywords for the
article <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">CO: manually-assigned codes for companies or
other organizations <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">DATELINE: normally the location where the
article was filed, but sometimes has very unexpected
contents <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">GV: Branch of Government or Government Agency
mentioned in the article <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">SBREAKS: the byte position of section breaks
present in the file <o:p></o:p></span></small></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><small><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">ARTICLEBREAK: separates files that contain more
than one article <o:p></o:p></span></small></li>
</ul>
<small><span
style="font-family:"Calibri","sans-serif";mso-ascii-theme-font:minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-theme-font:minor-latin">Contact
LDC to obtain the update.</span></small><br>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span></p>
<br>
<p class="MsoNormal" style="margin-bottom:12.0pt;text-align:center;
line-height:normal" align="center"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
<a style="mso-comment-reference:dd_3;mso-comment-date:
20121211T1312"><span style="mso-bookmark:_GoBack"><b
style="mso-bidi-font-weight: normal">New publications</b></span></a><span
class="MsoCommentReference"><span
style="font-family:"Calibri","sans-serif";
mso-ascii-theme-font:minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-theme-font:
minor-latin"><span style="mso-special-character:comment"></span></span></span><o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><a name="gale"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)
</span><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T24"><span
style="mso-fareast-font-family:"Times New
Roman"">GALE Chinese-English Word Alignment and
Tagging Training Part 3 -- Web</span></a></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> was developed by LDC
and contains 154,541 tokens of word aligned Chinese and
English parallel text enriched with linguistic tags. This
material was used as training data in the </span><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://projects.ldc.upenn.edu/gale/index.html"><span
style="mso-fareast-font-family: "Times New
Roman"">DARPA GALE</span></a></span><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> (Global Autonomous Language
Exploitation) program.<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Some
approaches to statistical machine translation include the
incorporation of linguistic knowledge in word aligned text as
a means to improve automatic word alignment and machine
translation quality. This is accomplished with two annotation
schemes: alignment and tagging. Alignment identifies minimum
translation units and translation relations by using
minimum-match and attachment annotation approaches. A set of
word tags and alignment link tags are designed in the tagging
scheme to describe these translation units and relations.
Tagging adds contextual, syntactic and language-specific
features to the alignment annotation. <br>
</span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">GALE
Chinese-English Word Alignment and Tagging Training Part 1 --
Newswire and Web (LDC2012T16) and GALE Chinese-English Word
Alignment and Tagging Training Part 3 -- Web (LDC2012T20) are
also available through LDC.<br>
<br>
This release consists of Chinese source web data (newsgroup,
weblog) collected by LDC in 2008 and 2009. The distribution by
words, character tokens and segments appears below: <o:p></o:p></span></big></p>
<big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></big>
<big> </big><big> </big><big> </big><big> </big><big> </big><big>
</big><big> </big><big> </big><big> </big><big> </big><big> </big><big>
</big><big> </big><big> </big><big> </big><big> </big><big> </big>
<table class="MsoNormalTable" style="mso-cellspacing:1.5pt;
mso-yfti-tbllook:1184" border="1" cellpadding="0">
<tbody>
<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Language<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Files<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Words<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">CharTokens<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Segments<o:p></o:p></span></big></p>
<big> </big></td>
</tr>
<tr style="mso-yfti-irow:1;mso-yfti-lastrow:yes">
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Chinese<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">1249<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">103027<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">154541<o:p></o:p></span></big></p>
<big> </big></td>
<td style="padding:.75pt .75pt .75pt .75pt"><big> </big>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">4842<o:p></o:p></span></big></p>
<big> </big></td>
</tr>
</tbody>
</table>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
Note that all token counts are based on the Chinese data only.
One token is equivalent to one character and one word is
equivalent to 1.5 characters.<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
Chinese word alignment tasks consisted of the following
components: <o:p></o:p></span></big></p>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying,
aligning, and tagging 8 different types of links<o:p></o:p></span></big></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying,
attaching, and tagging local-level unmatched words<o:p></o:p></span></big></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying
and tagging sentence/discourse-level unmatched words<o:p></o:p></span></big></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying
and tagging all instances of Chinese </span><span
style="font-size:12.0pt; font-family:"MS
Mincho";mso-ascii-font-family:Calibri;mso-ascii-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">的</span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(DE)
except when they were a part of a semantic link.<o:p></o:p></span></big></li>
</ul>
<br style="mso-special-character:line-break">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
<br style="mso-special-character:line-break">
<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><big><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">*<br
style="mso-special-character:line-break">
<br style="mso-special-character:line-break">
<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><a name="russian"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(2)
</span><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T23"><span
style="mso-fareast-font-family:"Times New
Roman"">Russian-English Computer Security Parallel
Text</span></a></span><span style="font-size:12.0pt;
mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> was developed by </span><span
style="font-size:12.0pt;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.mitre.org/"><span
style="mso-fareast-font-family:"Times New
Roman"">The MITRE Corporation</span></a></span><span
style="font-size:12.0pt;mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">.
It consists of parallel sentences from a set of computer
security reports published in Russian and translated into
English by translators with particular expertise in the
technical area. Translators were instructed to err on the side
of literal translation if required, but to maintain the
technical writing style of the source and to make the
resulting English as natural as possible. The translators
followed specific guidelines for translation, and those are
included in this distribution.<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">There
are 6,276 lines of parallel Russian and English, with a total
of 60,059 words of Russian and 76,437 words of English,
presented in a separate UTF-8 plain text file for each
language. The sentences were translated in sequential order
and presented in a scrambled order, such that parallel
sentences at identical line numbers are translations. For
example, the 31st line of the English file is a translation of
the 31st line of the Russian file. The original line sequence
is not provided. 1,694 untranslated lines (such as code
snippets) are included as a separate file.<o:p></o:p></span></big></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><big><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></big></p>
<br>
<hr size="2" width="100%">
<div style="mso-element:comment-list">
<div style="mso-element:comment">
<div id="_com_1" class="msocomtxt" language="JavaScript"
onmouseover="msoCommentShow('_anchor_1','_com_1')"
onmouseout="msoCommentHide('_com_1')"><br>
<o:p></o:p></div>
</div>
<div style="mso-element:comment">
<div id="_com_3" class="msocomtxt" language="JavaScript"
onmouseover="msoCommentShow('_anchor_3','_com_3')"
onmouseout="msoCommentHide('_com_3')"> </div>
</div>
</div>
<div class="moz-text-html" lang="x-western">
<link rel="File-List"
href="file:///C:%5CUsers%5Celefthea%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml">
<link rel="themeData"
href="file:///C:%5CUsers%5Celefthea%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx">
<link rel="colorSchemeMapping"
href="file:///C:%5CUsers%5Celefthea%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml">
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</div>
</body>
</html>