[Corpora-List] Word Alignment does exist and goes well: A summary

Alberto Simões albie at alfarrabio.di.uminho.pt
Wed Jun 22 17:41:53 UTC 2011


Yay! Thanks!

On 22/06/2011 18:22, Dekai Wu wrote:
> It was pointed out that many people didn't have access to my "Alignment"
> chapter in the Handbook of Natural Language Processing.  Happily, CRC
> Press has (very kindly!) granted me permission to post the full PDF as
> as a free sample chapter.
>
> You can download it from
> http://www.cs.ust.hk/~dekai/library/WU_Dekai/Wu_Alignment2010.pdf
>
> As summarized by Xu Jiajin:  "The chapter has been extensively revised
> for the new second edition (2010), edited by N. Indurkhya & F.J.
> Damerau, Chapman and Hall / CRC Press, pp.367-408. (It covers token vs
> segmental alignments, at word, phrase/collocation, and sentence levels.
> Starting from flat models, it progressively moves to
> compositional/hierarchical models that can handle the sorts of
> constructions and idioms you are thinking about, using biparsing with
> transduction grammars.)"
>
> Hope this helps!
>
> --
> Prof. Dekai Wu   | dekai at cs.ust.hk   | http://www.cs.ust.hk/~dekai
> HKUST Human Language Technology Center
> Department of Computer Science and Engineering
> University of Science & Technology, Clear Water Bay, Hong Kong
> tel +852 2358.7000   |  dir +852 2358.6989   |  fax +852 2358.1477
>
>
>
> Xu Jiajin wrote:
>> Two days ago, I asked about Word Alignment, which was kindly responded
>> by eight colleagues (Alberto Simões, Afsaneh Fazly, Mark Sammons,
>> Graeme Hirst, Felipe Sánchez Martínez, Dekai Wu, Michael Barlow, and
>> João Graça).
>>
>> One of my first observations from the informative responses is that
>> most of, with one or two exceptions, colleagues are from the
>> department of Computer Science, and works in the Computational
>> Linguistics. This might be a perfect excuse that I was not aware of
>> the enormous work done in Word Alignment, as I am a linguist with a
>> theoretical flavour. :) :). Most linguists in contrastive linguistics
>> and translation studies see sentence alignment as the only reliable
>> and viable correspondence of linguistic units. However, when we look
>> around and beyond the scope of pure language studies, the aligning
>> work is far more than sentence alignment, especially after the discussion.
>>
>> I’d summarize the discussions as follows:
>>
>> Word Alignments are used in a variety of applications.
>>
>> 1. All Statistical Machine Translation systems, starting from word
>> alignments to extract translation units.
>>
>> 2. Jointly training models in different languages and coupling them
>> for better learning.
>>
>> 3. Passing annotations from one language to the other.
>>
>> There are several good implementations of word alignments, Poscat,
>> Berkley aligner, GIZA++ (Franz Och), mkcls (Franz Och) just to name a few.
>>
>> Word alignments do not have to be one to one, they can be many to many
>> and hence we can have phrase alignments.
>>
>> --the above adapted from João Graça
>>
>> Word alignment is term alignment to some extent, and possible term
>> with blanks or placeholders alignment, and possible alignment to empty
>> words (just like we have sentence alignment to empty sentences).
>>
>> 100% of word alignment might be difficult or even impossible (for
>> compound verbs, for instance).
>>
>> Calling it ‘a joke’ (The inappropriate wording in my target post) can
>> be offending to people working on word alignment.
>>
>> --the above adapted from Alberto Simões
>>
>> Related implementations and literature
>>
>> The Mathematics of Statistical Machine Translation: Parameter Estimation.
>>
>> Peter E Brown, Vincent J. Della Pietra, Stephen A. Della Pietra,
>>
>> Robert L. Mercer Computational Linguistics, 1993.
>>
>> The alignment template approach to statistical machine translation.
>>
>> Franz Josef Och and Hermann Ney. Computational Linguistics,
>> 30:417–449. 2004.
>>
>> Jörg Tiedemann's book "Bitext Alignment", which is about to be
>> published (probably this week!) by Morgan & Claypool
>> (morganclaypool.com <http://morganclaypool.com>) in their HLT
>> Synthesis series.It includes a 45-page chapter on word alignment.
>> (provided by Graeme Hirst)
>>
>> Word alignment implementations have been around for a while: GIZA++
>> (http://code.google.com/p/giza-pp/) is the most used, but there are
>> other word aligners such as BerkeleyAligner
>> (http://code.google.com/p/berkeleyaligner).
>>
>> GIZA++ implements the alignments models described in
>>
>> Och, Franz Josef, and Hermann Ney (2003) "A Systematic Comparison of
>> Various Statistical Alignment Models." Computational Linguistics
>> 29(1): 19-51. http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf
>>
>> Dekai Wu’s "Alignment" chapter in the Handbook of Natural Language
>> Processing.The chapter has been extensively revised for the new second
>> edition (2010), edited by N. Indurkhya & F.J. Damerau, Chapman and
>> Hall / CRC Press, pp.367-408. (It covers token vs segmental
>> alignments, at word, phrase/collocation, and sentence levels. Starting
>> from flat models, it progressively moves to compositional/hierarchical
>> models that can handle the sorts of constructions and idioms you are
>> thinking about, using biparsing with transduction grammars.)
>>
>> Thanks go to all the participants of the discussion, which is
>> enlightening and informative indeed.
>>
>> Best wishes,
>>
>>
>> Jiajin XU
>>
>> Beijing Foreign Studies University
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page:http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Alberto Simoes
CCTC-UM / CEHUM

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list