<font size="2"><span style="color: black;" lang="EN-GB">Two days ago, I asked

about Word Alignment, which was kindly responded by eight

colleagues (<span class="gd">Alberto Simões, Afsaneh Fazly, Mark Sammons, Graeme

Hirst, Felipe Sánchez Martínez, Dekai Wu</span><span class="go">, </span><span class="gd">Michael Barlow, and João Graça</span>). </span></font>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">One of my first observations

from the informative responses is that most of, with one or two

exceptions, colleagues are from the department of Computer Science, and works

in the Computational Linguistics. This might be a perfect excuse that I was not

aware of the enormous work done in Word Alignment, as I am a linguist with a theoretical

flavour. :) :). Most linguists in contrastive linguistics and translation studies see sentence alignment as the only reliable and viable correspondence of linguistic units. However, when we look around and beyond the scope of pure language studies, the aligning work is far more than sentence alignment, especially after the discussion.<br>

</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">I’d summarize the

discussions as follows:</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Word Alignments are

used in a variety of applications.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">1. All Statistical

Machine Translation systems, starting from word alignments to extract

translation units.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">2. Jointly training

models in different languages and coupling them for better learning.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">3. Passing annotations

from one language to the other.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">There are several good

implementations of word alignments, Poscat, Berkley aligner, GIZA++ (Franz Och),

mkcls (Franz Och) just to name a few.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Word alignments do not

have to be one to one, they can be many to many and hence we can have phrase

alignments.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US">--the

above adapted from </span><span class="gd"><span style="color: black;" lang="EN-GB">João

Graça</span></span></font></p>

<p class="MsoNormal"><font size="2"><span class="gd"><span style="color: black;" lang="EN-GB"> </span></span></font></p>

<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">Word alignment is term alignment to some extent, and possible term with

blanks or placeholders alignment, and possible alignment to empty words (just

like we have sentence alignment to empty sentences).</span></font></p>

<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">100% of word alignment might be difficult or even impossible (for

compound verbs, for instance).</span></font></p>

<p class="MsoNormal" style="text-align: left;" align="left"><font size="2"><span lang="EN-US">Calling it ‘a joke’ (The inappropriate wording in my target post) can be

offending to people </span><span style="color: black;" lang="EN-US">working on word alignment.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US">--the

above adapted from <img src="file:///C:/DOCUME%7E1/User/LOCALS%7E1/Temp/msohtml1/01/clip_image001.gif" class=" QrVm3d" name="upi" width="1" height="1">Alberto

Simões</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-US"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Related implementations and literature </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">The Mathematics of Statistical Machine

Translation: Parameter Estimation.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="IT">Peter E Brown,

Vincent J. Della Pietra, Stephen A. Della Pietra,</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Robert L. Mercer Computational Linguistics,

1993.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">The alignment template approach to

statistical machine translation.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Franz Josef Och and Hermann Ney.

Computational Linguistics, 30:417–449. 2004.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jörg Tiedemann's book "Bitext

Alignment", which is about to be published (probably this week!) by Morgan

& Claypool (<a href="http://morganclaypool.com">morganclaypool.com</a>) in their HLT Synthesis series.<span>  </span>It includes a 45-page chapter on word

alignment. (provided by Graeme Hirst)</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Word alignment implementations have been

around for a while: GIZA++ (<a href="http://code.google.com/p/giza-pp/">http://code.google.com/p/giza-pp/</a>) is the most

used, but there are other word aligners such as BerkeleyAligner

(<a href="http://code.google.com/p/berkeleyaligner">http://code.google.com/p/berkeleyaligner</a>).</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">GIZA++ implements the alignments models

described in</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Och, Franz Josef, and Hermann Ney (2003)

"A Systematic Comparison of Various Statistical Alignment Models." </span><span lang="IT">Computational Linguistics 29(1): 19-51.

<a href="http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf">http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf</a></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="IT"><span> </span></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Dekai Wu’s "Alignment" chapter in

the Handbook of Natural Language Processing.<span> 

</span>The chapter has been extensively revised for the new second edition

(2010), edited by N. Indurkhya & F.J. Damerau, Chapman and Hall / CRC

Press, pp.367-408. (It covers token vs segmental alignments, at word,

phrase/collocation, and sentence levels. Starting from flat models, it

progressively moves to compositional/hierarchical models that can handle the

sorts of constructions and idioms you are thinking about, using biparsing with

transduction grammars.)</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color: black;" lang="EN-GB">Thanks go to all

the participants of the discussion, which is enlightening and informative indeed.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p><font size="2">

</font><p class="MsoNormal"><font size="2"><span lang="EN-GB">Best wishes,</span></font></p><p class="MsoNormal"><font size="2"><span lang="EN-GB"><br></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jiajin XU</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Beijing Foreign Studies University</span></font></p>