<div><br></div><div>Just to make your list of applications a bit more complete:</div><div><br></div><div>* word alignment can also be useful for lexicon extraction and rule induction in non-statistical machine translation</div>
<div>* it can be used for chunk alignment in example based MT</div><div>* word alignment is used for the extraction of domain-specific translation of terminology (mostly for computer-aided translation)</div><div>* word alignment has been used for word sense disambiguation/discrimination (using translations as "semantic mirrors" with different lexical ambiguities)</div>
<div>* it can be used to extract (WordNet-like) lexico-semantic relations (in one language and/or across languages)</div><div>* word alignment can be applied to find (monolingual) term variations which has been used for query expansion in IR and QA</div>
<div>* the extraction of paraphrases is another application where word alignment has been used</div><div>* interestingly enough, the limitations of automatic word alignment can also be used to identify non-compositional (idiomatic) expressions</div>
<div><br></div><div>I could give you a lot of pointers if you like.</div><div><br></div><div>Clearly, there is some confusion about the term "alignment" (which is not used as a monotonic, complete, one-to-one mapping in word alignment) and automatic word alignment is certainly very noisy, so that the alignment is usually not saved and just used to support another task (like translation modeling in SMT). Looking at automatic word alignment results it can sometimes (often?) feel like a joke but it can still be useful for many tasks as you have seen in the responses to your query.</div>
<div><br></div><div>Good luck with further discussions with your colleagues!</div><div><br></div><div>Jörg</div><div><br><br><div class="gmail_quote">On Thu, Jun 2, 2011 at 5:14 PM, Xu Jiajin <span dir="ltr"><<a href="mailto:ustcxujj@gmail.com">ustcxujj@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font size="2"><span style="color:black" lang="EN-GB">Two days ago, I asked
about Word Alignment, which was kindly responded by eight
colleagues (<span>Alberto Simões, Afsaneh Fazly, Mark Sammons, Graeme
Hirst, Felipe Sánchez Martínez, Dekai Wu</span><span>, </span><span>Michael Barlow, and João Graça</span>). </span></font>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">One of my first observations
from the informative responses is that most of, with one or two
exceptions, colleagues are from the department of Computer Science, and works
in the Computational Linguistics. This might be a perfect excuse that I was not
aware of the enormous work done in Word Alignment, as I am a linguist with a theoretical
flavour. :) :). Most linguists in contrastive linguistics and translation studies see sentence alignment as the only reliable and viable correspondence of linguistic units. However, when we look around and beyond the scope of pure language studies, the aligning work is far more than sentence alignment, especially after the discussion.<br>
</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">I’d summarize the
discussions as follows:</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Word Alignments are
used in a variety of applications.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">1. All Statistical
Machine Translation systems, starting from word alignments to extract
translation units.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">2. Jointly training
models in different languages and coupling them for better learning.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">3. Passing annotations
from one language to the other.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">There are several good
implementations of word alignments, Poscat, Berkley aligner, GIZA++ (Franz Och),
mkcls (Franz Och) just to name a few.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Word alignments do not
have to be one to one, they can be many to many and hence we can have phrase
alignments.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US">--the
above adapted from </span><span><span style="color:black" lang="EN-GB">João
Graça</span></span></font></p>
<p class="MsoNormal"><font size="2"><span><span style="color:black" lang="EN-GB"> </span></span></font></p>
<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">Word alignment is term alignment to some extent, and possible term with
blanks or placeholders alignment, and possible alignment to empty words (just
like we have sentence alignment to empty sentences).</span></font></p>
<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">100% of word alignment might be difficult or even impossible (for
compound verbs, for instance).</span></font></p>
<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">Calling it ‘a joke’ (The inappropriate wording in my target post) can be
offending to people </span><span style="color:black" lang="EN-US">working on word alignment.</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US">--the
above adapted from <img name="upi" width="1" height="1">Alberto
Simões</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Related implementations and literature </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">The Mathematics of Statistical Machine
Translation: Parameter Estimation.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="IT">Peter E Brown,
Vincent J. Della Pietra, Stephen A. Della Pietra,</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Robert L. Mercer Computational Linguistics,
1993.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">The alignment template approach to
statistical machine translation.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Franz Josef Och and Hermann Ney.
Computational Linguistics, 30:<a href="tel:417%E2%80%93449.%202004" value="+14174492004" target="_blank">417–449. 2004</a>.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jörg Tiedemann's book "Bitext
Alignment", which is about to be published (probably this week!) by Morgan
& Claypool (<a href="http://morganclaypool.com" target="_blank">morganclaypool.com</a>) in their HLT Synthesis series.<span> </span>It includes a 45-page chapter on word
alignment. (provided by Graeme Hirst)</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Word alignment implementations have been
around for a while: GIZA++ (<a href="http://code.google.com/p/giza-pp/" target="_blank">http://code.google.com/p/giza-pp/</a>) is the most
used, but there are other word aligners such as BerkeleyAligner
(<a href="http://code.google.com/p/berkeleyaligner" target="_blank">http://code.google.com/p/berkeleyaligner</a>).</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">GIZA++ implements the alignments models
described in</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Och, Franz Josef, and Hermann Ney (2003)
"A Systematic Comparison of Various Statistical Alignment Models." </span><span lang="IT">Computational Linguistics 29(1): 19-51.
<a href="http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf" target="_blank">http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf</a></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="IT"><span> </span></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Dekai Wu’s "Alignment" chapter in
the Handbook of Natural Language Processing.<span>
</span>The chapter has been extensively revised for the new second edition
(2010), edited by N. Indurkhya & F.J. Damerau, Chapman and Hall / CRC
Press, pp.367-408. (It covers token vs segmental alignments, at word,
phrase/collocation, and sentence levels. Starting from flat models, it
progressively moves to compositional/hierarchical models that can handle the
sorts of constructions and idioms you are thinking about, using biparsing with
transduction grammars.)</span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Thanks go to all
the participants of the discussion, which is enlightening and informative indeed.</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p><font size="2">
</font><p class="MsoNormal"><font size="2"><span lang="EN-GB">Best wishes,</span></font></p><p class="MsoNormal"><font size="2"><span lang="EN-GB"><br></span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jiajin XU</span></font></p>
<p class="MsoNormal"><font size="2"><span lang="EN-GB">Beijing Foreign Studies University</span></font></p>
<br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>**********************************************************************************<br> Jörg Tiedemann <a href="mailto:jorg.tiedemann@lingfil.uu.se" target="_blank">jorg.tiedemann@lingfil.uu.se</a><br>
Dep. of Linguistics and Philology <br> <a href="http://stp.lingfil.uu.se/~joerg/" target="_blank">http://stp.lingfil.uu.se/~joerg/</a><br> Uppsala University tel: +46 (0)18 - 471 1412<br>
Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094<br><br>
</div>