<div><br></div><div>Just to make your list of applications a bit more complete:</div><div><br></div><div>* word alignment can also be useful for lexicon extraction and rule induction in non-statistical machine translation</div>

<div>* it can be used for chunk alignment in example based MT</div><div>* word alignment is used for the extraction of domain-specific translation of terminology (mostly for computer-aided translation)</div><div>* word alignment has been used for word sense disambiguation/discrimination (using translations as "semantic mirrors" with different lexical ambiguities)</div>

<div>* it can be used to extract (WordNet-like) lexico-semantic relations (in one language and/or across languages)</div><div>* word alignment can be applied to find (monolingual) term variations which has been used for query expansion in IR and QA</div>

<div>* the extraction of paraphrases is another application where word alignment has been used</div><div>* interestingly enough, the limitations of automatic word alignment can also be used to identify non-compositional (idiomatic) expressions</div>

<div><br></div><div>I could give you a lot of pointers if you like.</div><div><br></div><div>Clearly, there is some confusion about the term "alignment" (which is not used as a monotonic, complete, one-to-one mapping in word alignment) and automatic word alignment is certainly very noisy, so that the alignment is usually not saved and just used to support another task (like translation modeling in SMT). Looking at automatic word alignment results it can sometimes (often?) feel like a joke but it can still be useful for many tasks as you have seen in the responses to your query.</div>

<div><br></div><div>Good luck with further discussions with your colleagues!</div><div><br></div><div>Jörg</div><div><br><br><div class="gmail_quote">On Thu, Jun 2, 2011 at 5:14 PM, Xu Jiajin <span dir="ltr"><<a href="mailto:ustcxujj@gmail.com">ustcxujj@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font size="2"><span style="color:black" lang="EN-GB">Two days ago, I asked

about Word Alignment, which was kindly responded by eight

colleagues (<span>Alberto Simões, Afsaneh Fazly, Mark Sammons, Graeme

Hirst, Felipe Sánchez Martínez, Dekai Wu</span><span>, </span><span>Michael Barlow, and João Graça</span>). </span></font>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">One of my first observations

from the informative responses is that most of, with one or two

exceptions, colleagues are from the department of Computer Science, and works

in the Computational Linguistics. This might be a perfect excuse that I was not

aware of the enormous work done in Word Alignment, as I am a linguist with a theoretical

flavour. :) :). Most linguists in contrastive linguistics and translation studies see sentence alignment as the only reliable and viable correspondence of linguistic units. However, when we look around and beyond the scope of pure language studies, the aligning work is far more than sentence alignment, especially after the discussion.<br>

</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">I’d summarize the

discussions as follows:</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Word Alignments are

used in a variety of applications.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">1. All Statistical

Machine Translation systems, starting from word alignments to extract

translation units.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">2. Jointly training

models in different languages and coupling them for better learning.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">3. Passing annotations

from one language to the other.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">There are several good

implementations of word alignments, Poscat, Berkley aligner, GIZA++ (Franz Och),

mkcls (Franz Och) just to name a few.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Word alignments do not

have to be one to one, they can be many to many and hence we can have phrase

alignments.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US">--the

above adapted from </span><span><span style="color:black" lang="EN-GB">João

Graça</span></span></font></p>

<p class="MsoNormal"><font size="2"><span><span style="color:black" lang="EN-GB"> </span></span></font></p>

<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">Word alignment is term alignment to some extent, and possible term with

blanks or placeholders alignment, and possible alignment to empty words (just

like we have sentence alignment to empty sentences).</span></font></p>

<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">100% of word alignment might be difficult or even impossible (for

compound verbs, for instance).</span></font></p>

<p class="MsoNormal" style="text-align:left" align="left"><font size="2"><span lang="EN-US">Calling it ‘a joke’ (The inappropriate wording in my target post) can be

offending to people </span><span style="color:black" lang="EN-US">working on word alignment.</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US">--the

above adapted from <img name="upi" width="1" height="1">Alberto

Simões</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-US"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Related implementations and literature </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">The Mathematics of Statistical Machine

Translation: Parameter Estimation.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="IT">Peter E Brown,

Vincent J. Della Pietra, Stephen A. Della Pietra,</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Robert L. Mercer Computational Linguistics,

1993.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">The alignment template approach to

statistical machine translation.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Franz Josef Och and Hermann Ney.

Computational Linguistics, 30:<a href="tel:417%E2%80%93449.%202004" value="+14174492004" target="_blank">417–449. 2004</a>.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jörg Tiedemann's book "Bitext

Alignment", which is about to be published (probably this week!) by Morgan

& Claypool (<a href="http://morganclaypool.com" target="_blank">morganclaypool.com</a>) in their HLT Synthesis series.<span>  </span>It includes a 45-page chapter on word

alignment. (provided by Graeme Hirst)</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Word alignment implementations have been

around for a while: GIZA++ (<a href="http://code.google.com/p/giza-pp/" target="_blank">http://code.google.com/p/giza-pp/</a>) is the most

used, but there are other word aligners such as BerkeleyAligner

(<a href="http://code.google.com/p/berkeleyaligner" target="_blank">http://code.google.com/p/berkeleyaligner</a>).</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">GIZA++ implements the alignments models

described in</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Och, Franz Josef, and Hermann Ney (2003)

"A Systematic Comparison of Various Statistical Alignment Models." </span><span lang="IT">Computational Linguistics 29(1): 19-51.

<a href="http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf" target="_blank">http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf</a></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="IT"><span> </span></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Dekai Wu’s "Alignment" chapter in

the Handbook of Natural Language Processing.<span> 

</span>The chapter has been extensively revised for the new second edition

(2010), edited by N. Indurkhya & F.J. Damerau, Chapman and Hall / CRC

Press, pp.367-408. (It covers token vs segmental alignments, at word,

phrase/collocation, and sentence levels. Starting from flat models, it

progressively moves to compositional/hierarchical models that can handle the

sorts of constructions and idioms you are thinking about, using biparsing with

transduction grammars.)</span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span style="color:black" lang="EN-GB">Thanks go to all

the participants of the discussion, which is enlightening and informative indeed.</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB"> </span></font></p><font size="2">

</font><p class="MsoNormal"><font size="2"><span lang="EN-GB">Best wishes,</span></font></p><p class="MsoNormal"><font size="2"><span lang="EN-GB"><br></span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Jiajin XU</span></font></p>

<p class="MsoNormal"><font size="2"><span lang="EN-GB">Beijing Foreign Studies University</span></font></p>

<br>_______________________________________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br>**********************************************************************************<br> Jörg Tiedemann                                     <a href="mailto:jorg.tiedemann@lingfil.uu.se" target="_blank">jorg.tiedemann@lingfil.uu.se</a><br>

 Dep. of Linguistics and Philology              <br> <a href="http://stp.lingfil.uu.se/~joerg/" target="_blank">http://stp.lingfil.uu.se/~joerg/</a><br> Uppsala University                                  tel:  +46 (0)18 - 471 1412<br>

 Box 635, SE-751 26 Uppsala/SWEDEN   fax: +46 (0)18 - 471 1094<br><br>

</div>