[Corpora-List] WORD ALIGNMENT: Does it exist?

Dekai Wu dekai at cs.ust.hk
Wed Jun 1 19:05:53 UTC 2011


I think you will find a lot of relevant discussion in my "Alignment" 
chapter in the Handbook of Natural Language Processing.  The chapter has 
been extensively revised for the new second edition (2010), edited by N. 
Indurkhya & F.J. Damerau, Chapman and Hall / CRC Press, pp.367-408.

It covers token vs segmental alignments, at word, phrase/collocation, 
and sentence levels.  Starting from flat models, it progressively moves 
to compositional/hierarchical models that can handle the sorts of 
constructions and idioms you are thinking about, using biparsing with 
transduction grammars.

--
Prof. Dekai Wu   |   dekai at cs.ust.hk   |   http://www.cs.ust.hk/~dekai
HKUST Human Language Technology Center
Department of Computer Science and Engineering
University of Science & Technology, Clear Water Bay, Hong Kong
tel +852 2358.7000   |  dir +852 2358.6989   |  fax +852 2358.1477



Graeme Hirst wrote:
> Also see Jörg Tiedemann's book "Bitext Alignment", which is about to be published (probably this week!) by Morgan & Claypool (morganclaypool.com) in their HLT Synthesis series.  It includes a 45-page chapter on word alignment.
>
>
>
> On 1 Jun 2011, at 10:58, Afsaneh Fazly wrote:
>
>   
>> Look at these two (among many others):
>>
>> The Mathematics of Statistical Machine Translation: Parameter Estimation.
>> Peter E Brown, Vincent J. Della Pietra, Stephen A. Della Pietra,
>> Robert L. Mercer
>> Computational Linguistics, 1993.
>>
>> The alignment template approach to statistical machine translation.
>> Franz Josef Och and Hermann Ney.
>> Computational Linguistics, 30:417–449. 2004.
>>
>> On Wed, Jun 1, 2011 at 10:28 AM, Xu Jiajin <ustcxujj at gmail.com> wrote:
>>     
>>> Hi all,
>>>
>>>
>>>
>>> The other day, over an academic discussion, my colleague and I had a brief
>>> debate on WORD ALIGNMENT, while we were talking about bilingual text
>>> aligning practices. When I was reviewing different levels of alignment, such
>>> as sentence alignment and word alignment, I commented that WORD ALIGNMENT IS
>>> A JOKE, as it is never likely to aligning words. My comment was immediately
>>> refuted by another professor. But I have not been convinced by his
>>> counterargument so far.
>>>
>>>
>>>
>>> In my mind, word alignment is not realistic since it’s impossible to find
>>> one-to-one correspondence of parallel texts on the word level. For instance,
>>> it’s most likely that words in a sentence are not translated, either kept
>>> implicit or assimilated into other words, constructions, idioms and so
>>> forth. I reckon it is also the case for parallel texts of cognate languages.
>>>
>>>
>>>
>>> But on a second thought, the alignment of a selection of words, say, lexical
>>> words, or jargons, across texts is not impossible. However, linguistic
>>> alignment, as I see it, has to be exhaustive. In saying so, I actually
>>> consider sentence alignment as the canonical type of text alignment. Each
>>> sentence is aligned to one or more sentences in the target texts, and the
>>> other way round.
>>>
>>>
>>>
>>> I am wondering whether there ARE word alignment implementations in practice.
>>> I would appreciate any pointers to relevant literature or tools, as well as
>>> the clarification of the notion alignment.
>>>
>>>
>>>
>>> Maybe due to my ignorance, word alignment has been a mature technology for
>>> many years. Could anyone tell me what are main uses of word alignment?
>>> Bilingual lexicon? Any other applications?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Jiajin XU
>>>
>>> Ph.D., associate professor (discourse studies, corpus linguistics)
>>>
>>> National Research Centre for Foreign Language Education
>>>
>>> Beijing Foreign Studies University
>>>
>>> Beijing 100089
>>>
>>> China
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>>
>>>       
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>     
>
>
> --
> ::::  Graeme Hirst
> ::::  University of Toronto * Department of Computer Science
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110602/1bed974c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list