<font size="2"><span style="color: black;" lang="EN-GB">Two days ago, I asked
about Word Alignment, which was kindly responded by eight
colleagues (<span class="gd">Alberto Simões, Afsaneh Fazly, Mark Sammons, Graeme
Hirst, Felipe Sánchez Martínez, Dekai Wu</span><span class="go">, </span><span class="gd">Michael Barlow, and João Graça</span>). </span></font>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">One of my first observations
from the informative responses is that most of, with one or two
exceptions, colleagues are from the department of Computer Science, and works
in the Computational Linguistics. This might be a perfect excuse that I was not
aware of the enormous work done in Word Alignment, as I am a linguist with a theoretical
flavour. :) :). Most linguists in contrastive linguistics and translation studies see sentence alignment as the only reliable and viable correspondence of linguistic units. However, when we look around and beyond the scope of pure language studies, the aligning work is far more than sentence alignment, especially after the discussion.<br>
</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">I’d summarize the
discussions as follows:</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Word Alignments are
used in a variety of applications.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">1. All Statistical
Machine Translation systems, starting from word alignments to extract
translation units.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">2. Jointly training
models in different languages and coupling them for better learning.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">3. Passing annotations
from one language to the other.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">There are several good
implementations of word alignments, Poscat, Berkley aligner, GIZA++ (Franz Och),
mkcls (Franz Och) just to name a few.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Word alignments do not
have to be one to one, they can be many to many and hence we can have phrase
alignments.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US">--the
above adapted from </span><span class="gd"><span style="color: black;" lang="EN-GB">João
Graça</span></span></font></p>
<p class="MsoNormal"><font size="2"><span class="gd"><span style="color: black;" lang="EN-GB"> </span></span></font></p>
<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">Word alignment is term alignment to some extent, and possible term with
blanks or placeholders alignment, and possible alignment to empty words (just
like we have sentence alignment to empty sentences).</span></font></p>
<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">100% of word alignment might be difficult or even impossible (for
compound verbs, for instance).</span></font></p>
<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">Calling it ‘a joke’ (The inappropriate wording in my target post) can be
offending to people </span><span style="color: black;" lang="EN-US">working on word alignment.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US">--the
above adapted from <img src="file:///C:/DOCUME%7E1/User/LOCALS%7E1/Temp/msohtml1/01/clip_image001.gif" class=" QrVm3d" name="upi" width="1" height="1">Alberto
Simões</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Related implementations and literature </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">The Mathematics of Statistical Machine
Translation: Parameter Estimation.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="IT">Peter E Brown,
Vincent J. Della Pietra, Stephen A. Della Pietra,</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Robert L. Mercer Computational Linguistics,
1993.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">The alignment template approach to
statistical machine translation.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Franz Josef Och and Hermann Ney.
Computational Linguistics, 30:417–449. 2004.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jörg Tiedemann's book "Bitext
Alignment", which is about to be published (probably this week!) by Morgan
& Claypool (<a href="http://morganclaypool.com">morganclaypool.com</a>) in their HLT Synthesis series.<span> </span>It includes a 45-page chapter on word
alignment. (provided by Graeme Hirst)</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Word alignment implementations have been
around for a while: GIZA++ (<a href="http://code.google.com/p/giza-pp/">http://code.google.com/p/giza-pp/</a>) is the most
used, but there are other word aligners such as BerkeleyAligner
(<a href="http://code.google.com/p/berkeleyaligner">http://code.google.com/p/berkeleyaligner</a>).</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">GIZA++ implements the alignments models
described in</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Och, Franz Josef, and Hermann Ney (2003)
"A Systematic Comparison of Various Statistical Alignment Models." </span><span lang="IT">Computational Linguistics 29(1): 19-51.
<a href="http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf">http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf</a></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="IT"><span> </span></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Dekai Wu’s "Alignment" chapter in
the Handbook of Natural Language Processing.<span>
</span>The chapter has been extensively revised for the new second edition
(2010), edited by N. Indurkhya & F.J. Damerau, Chapman and Hall / CRC
Press, pp.367-408. (It covers token vs segmental alignments, at word,
phrase/collocation, and sentence levels. Starting from flat models, it
progressively moves to compositional/hierarchical models that can handle the
sorts of constructions and idioms you are thinking about, using biparsing with
transduction grammars.)</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Thanks go to all
the participants of the discussion, which is enlightening and informative indeed.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p><font size="2">
</font><p class="MsoNormal"><font size="2"><span lang="EN-GB">Best wishes,</span></font></p><p class="MsoNormal"><font size="2"><span lang="EN-GB"><br></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jiajin XU</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Beijing Foreign Studies University</span></font></p>