<div dir="ltr"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">------------------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">------------------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">------------</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Arabic-L: Fri 18 Jul 2014</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Moderator: Dilworth Parkinson <</span><a href="mailto:dilworth_parkinson@byu.edu" style="font-size:13.333333969116211px;font-family:arial,sans-serif" target="_blank">dilworth_parkinson@byu.edu</a><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">></span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">[To post messages to the list, send them to </span><a href="mailto:arabic-l@byu.edu" style="font-size:13.333333969116211px;font-family:arial,sans-serif" target="_blank">arabic-l@byu.edu</a><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">]</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">[To unsubscribe, send message from same address you subscribed from to</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<a href="mailto:listserv@byu.edu" style="font-size:13.333333969116211px;font-family:arial,sans-serif" target="_blank">listserv@byu.edu</a><span style="font-size:13.333333969116211px;font-family:arial,sans-serif"> with first line reading:</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif"> unsubscribe arabic-l ]</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">-------------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Directory---------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">---------------</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">1) Subject: </span><font face="arial, sans-serif">GALE Arabic-English Word Alignment Training Part 3-Web</font><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">-------------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Messages----------------------</span><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">-------------</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">1)</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Date: </span><span style="font-size:13px;font-family:arial,sans-serif">18 Jul 2014</span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">From: reposted from LDC <</span><span style="color:rgb(85,85,85);font-family:arial,sans-serif;font-size:13px;white-space:nowrap"><a href="mailto:ldc@ldc.upenn.edu" target="_blank">ldc@ldc.upenn.edu</a>></span><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<span style="font-size:13.333333969116211px;font-family:arial,sans-serif">Subject: </span><font face="arial, sans-serif">GALE Arabic-English Word Alignment Training Part 3-Web</font><br style="font-size:13.333333969116211px;font-family:arial,sans-serif">
<br style="font-size:13.333333969116211px;font-family:arial,sans-serif"><p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">(2) <a href="https://catalog.ldc.upenn.edu/LDC2014T14" target="_blank"><span style="color:blue">GALE Arabic-English Word Alignment Training Part 3 -- Web</span></a> was developed by LDC and contains 217,158 tokens of word aligned Arabic and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program.<u></u><u></u></p>
<p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.<u></u><u></u></p>
<p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">Other releases available in this series are:<u></u><u></u></p><blockquote style="font-family:arial,sans-serif;font-size:13px"><p class="MsoNormal">
GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web (<a href="http://catalog.ldc.upenn.edu/LDC2012T16" target="_blank"><span style="color:blue">LDC2012T16</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire (<a href="http://catalog.ldc.upenn.edu/LDC2012T20" target="_blank"><span style="color:blue">LDC2012T20</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web (<a href="http://catalog.ldc.upenn.edu/LDC2012T24" target="_blank"><span style="color:blue">LDC2012T24</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web (<a href="http://catalog.ldc.upenn.edu/LDC2013T05" target="_blank"><span style="color:blue">LDC2013T05</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1 (<a href="http://catalog.ldc.upenn.edu/LDC2013T23" target="_blank"><span style="color:blue">LDC2013T23</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web (<a href="http://catalog.ldc.upenn.edu/LDC2014T05" target="_blank"><span style="color:blue">LDC2014T05</span></a>)<u></u><u></u></p>
<p class="MsoNormal">GALE Arabic-English Word Alignment Training Part 2 -- Newswire (<a href="http://catalog.ldc.upenn.edu/LDC2014T10" target="_blank"><span style="color:blue">LDC2014T10</span></a>)<u></u><u></u></p></blockquote>
<p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">This release consists of Arabic source web data collected by LDC. The distribution by genre, words, character tokens and segments appears below:<u></u><u></u></p>
<table border="1" cellpadding="0" style="font-family:arial,sans-serif;font-size:13px"><tbody><tr><td style="padding:0.75pt"><p class="MsoNormal">Language<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">
Genre<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">Files<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">Words<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">
CharTokens<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">Segments<u></u><u></u></p></td></tr><tr><td style="padding:0.75pt"><p class="MsoNormal">Arabic<u></u><u></u></p></td><td style="padding:0.75pt">
<p class="MsoNormal">WB<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">2,449<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">154,144<u></u><u></u></p></td><td style="padding:0.75pt">
<p class="MsoNormal">217,158<u></u><u></u></p></td><td style="padding:0.75pt"><p class="MsoNormal">7,332<u></u><u></u></p></td></tr></tbody></table><p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">
Note that word count is based on the untokenized Arabic source, and token count is based on the tokenized Arabic source.<u></u><u></u></p><p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">The Arabic word alignment tasks consisted of the following components:<u></u><u></u></p>
<blockquote style="font-family:arial,sans-serif;font-size:13px"><p class="MsoNormal">Normalizing tokenized tokens as needed<u></u><u></u></p><p class="MsoNormal">Identifying different types of links<u></u><u></u></p><p class="MsoNormal">
Identifying sentence segments not suitable for annotation<u></u><u></u></p><p class="MsoNormal">Tagging unmatched words attached to other words or phrases<u></u><u></u></p></blockquote><p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">
GALE Arabic-English Word Alignment Training Part 3 -- Web is distributed via web download.<u></u><u></u></p><p class="MsoNormal" style="font-family:arial,sans-serif;font-size:13px">2014 Subscription Members will automatically receive two copies of this data on disc. 2014 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1750. <br>
</p><div><br></div><div><br></div><div style="font-size:13.333333969116211px;font-family:arial,sans-serif">--------------------------------------------------------------------------<br>
End of Arabic-L:<span style="font-size:13.333333969116211px"> </span>18 Jul 2014</div></div>