<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><b><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">- <a
href="#scholar">Fall 2013 Data Scholarship Program</a>
-<br>
</span></span></span></span></b></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><i><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">New
publications:</span></span></span></span></i><b><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><span
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><br>
</span></span></span></span></b></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
<a href="#prop">Chinese Proposition Bank 3.0</a> -<br>
</span></b></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
<a href="#gale">GALE Arabic-English Parallel Aligned Treebank
-- Broadcast News Part 1</a> -</span></b></p>
<hr size="2" width="100%">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal" align="center"><a name="scholar"></a><b
style="mso-bidi-font-weight:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Fall 2013 Data
Scholarship Program<o:p></o:p></span></b></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Applications
are now being accepted through September 16, 2013, 11:59PM EST
for the Fall 2013 LDC Data Scholarship program! The LDC Data
Scholarship program provides university students with access to
LDC data at no-cost.<o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
This program is open to students pursuing both undergraduate and
graduate studies in an accredited college or university. LDC
Data Scholarships are not restricted to any particular field of
study; however, students must demonstrate a well-developed
research agenda and a bona fide inability to pay. The selection
process is highly competitive. <br>
<br>
The application consists of two parts: <br>
<br>
(1) <i>Data Use Proposal</i>. Applicants must submit a proposal
describing their intended use of the data. The proposal should
state which data the student plans to use and how the data will
benefit their research project as well as information on the
proposed methodology or algorithm.<br>
<br>
Applicants should consult the </span><a
href="http://www.ldc.upenn.edu/Catalog/index.jsp"
target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC Corpus
Catalog</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> for a complete list of
data distributed by LDC. Due to certain restrictions, a handful
of LDC corpora are restricted to members of the Consortium.
Applicants are advised to select a maximum of one to two
databases.<br>
<br>
(2) <i>Letter of Support</i>. Applicants must submit one letter
of support from their thesis adviser or department chair. The
letter must confirm that the department or university lacks the
funding to pay the full Non-member Fee for the data and verify
the student's need for data. <br>
<br>
For further information on application materials and program
rules, please visit the </span><a
href="http://www.ldc.upenn.edu/About/scholarships.html"
target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC
Data
Scholarship</span></a><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
page. <br>
<br>
Students can email their applications to the </span><a
href="mailto:datascholarships@ldc.upenn.edu"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:blue">LDC Data
Scholarship program</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. Decisions will be
sent by email from the same address.<br>
<br>
The deadline for the Fall 2013 program<span
style="mso-spacerun:yes"> </span>is Monday, September 16,
2013, 11:59PM EST.<br>
<br>
</span></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> <br>
</span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> <b
style="mso-bidi-font-weight:normal">
New publications</b></span><br>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"></span></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><br>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><a name="prop"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)
</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2013T13"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">Chinese
Proposition Bank 3.0</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
is a continuation of the </span><a
href="http://www.cs.brandeis.edu/%7Eclp/ctb/cpb/"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">Chinese
Proposition Bank</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> project which aims to
create a corpus of text annotated with information about basic
semantic propositions. Chinese Proposition Bank 3.0 adds
predicate-argument annotation on 187,731 words from Chinese
Treebank 7.0 (</span><a
href="http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2010T07"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC2010T07</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">). The data sources are
comprised of newswire, magazine articles, various broadcast news
and broadcast conversation programming, web newsgroups and
weblogs. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">LDC
has also released Chinese Proposition Bank 1.0 (</span><a
href="http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2005T23"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC2005T23</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">) and Chinese
Proposition Bank 2.0 (</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T07"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC2008T07</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">).<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">This
release contains the predicate-argument annotation of 173,206
verb instances and 14,525 noun instances. The annotation of
nouns is limited to nominalizations that have a corresponding
verb. The general annotation guidelines and the lexical
guidelines (called frame files) for each verbal and nominal
predicate are also included in this release. Below are some
statistics about the corpus.<o:p></o:p></span></p>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
propositions for verbs - 173,206<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
propositions for nouns - 14,525<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
verbs framed - 24,642<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
framesets - 26,467<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Verbs
with multiple framesets - 1337<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Average
framesets per verb - 1.07<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
nouns framed - 1,421<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Total
noun framesets - 1,528<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Nouns
with multiple framesets - 48<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l1 level1 lfo1;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Average
framesets per nouns - 1.08<o:p></o:p></span></li>
</ul>
<span style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;line-height:normal" align="center"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">*<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><a name="gale"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(2)
</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2013T14"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">GALE
Arabic-English Parallel Aligned Treebank -- Broadcast News
Part 1</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> was developed by LDC
and contains 115,826 tokens of word aligned Arabic and English
parallel text with treebank annotations. This material was used
as training data in the DARPA GALE (Global Autonomous Language
Exploitation) program.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Parallel
aligned treebanks are treebanks annotated with morphological and
syntactic structures aligned at the sentence level and the
sub-sentence level. Such data sets are useful for natural
language processing and related fields, including automatic word
alignment system training and evaluation, transfer-rule
extraction, word sense disambiguation, translation lexicon
extraction and cultural heritage and cross-linguistic studies.
With respect to machine translation system development, parallel
aligned treebanks may improve system performance with enhanced
syntactic parsers, better rules and knowledge about language
pairs and reduced word error rate.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">In
this release, the source Arabic data was translated into
English. Arabic and English treebank annotations were performed
independently. The parallel texts were then word aligned. The
material in this corpus corresponds to a portion of the Arabic
treebanked data in Arabic Treebank - Broadcast News v1.0 (</span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T07"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;color:blue">LDC2012T07</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">).<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
source data consists of Arabic broadcast news programming
collected by LDC in 2005 and 2006 from Alhurra, Aljazeera and
Dubai TV. All data is encoded as UTF-8. A count of files, words,
tokens and segments is below.<o:p></o:p></span></p>
<table class="MsoNormalTable" style="mso-cellspacing:1.5pt;
mso-yfti-tbllook:1184" border="1" cellpadding="0">
<tbody>
<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Language<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Files<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Words<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Tokens<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Segments<o:p></o:p></span></p>
</td>
</tr>
<tr style="mso-yfti-irow:1;mso-yfti-lastrow:yes">
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Arabic<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">28<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">89,213<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">115,826<o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">4,824<o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Note:
Word count is based on the untokenized Arabic source. Ttoken
count is based on the ATB-tokenized Arabic source.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
purpose of the GALE word alignment task was to find
correspondences between words, phrases or groups of words in a
set of parallel texts. Arabic-English word alignment annotation
consisted of the following tasks:<o:p></o:p></span></p>
<ul type="disc">
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying
different types of links: translated (correct or incorrect)
and not translated (correct or incorrect)<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Identifying
sentence segments not suitable for annotation, e.g., blank
segments, incorrectly-segmented segments, segments with
foreign languages<o:p></o:p></span></li>
<li class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal;mso-list:l0 level1 lfo2;tab-stops:list .5in"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Tagging
unmatched words attached to other words or phrases<o:p></o:p></span></li>
</ul>
<br>
<hr size="2" width="100%"><br>
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>