<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=windows-1252"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
I think you will find a lot of relevant discussion in my "Alignment"
chapter in the Handbook of Natural Language Processing. The chapter
has been extensively revised for the new second edition (2010), edited
by
N. Indurkhya & F.J. Damerau, Chapman and Hall / CRC Press,
pp.367-408.<br>
<br>
It covers token vs segmental alignments, at word, phrase/collocation,
and sentence levels. Starting from flat models, it progressively moves
to compositional/hierarchical models that can handle the sorts of
constructions and idioms you are thinking about, using biparsing with
transduction grammars.<br>
<br>
--<br>
Prof. Dekai Wu <font color="grey"> | </font> <a class="moz-txt-link-abbreviated" href="mailto:dekai@cs.ust.hk">dekai@cs.ust.hk</a> <font
color="grey"> | </font> <a class="moz-txt-link-freetext" href="http://www.cs.ust.hk/~dekai">http://www.cs.ust.hk/~dekai</a><br>
HKUST Human Language Technology Center<br>
Department of Computer Science and Engineering<br>
University of Science & Technology, Clear Water Bay, Hong Kong<br>
<font color="grey">tel</font> +852 2358.7000 <font color="grey"> |
dir</font> +852 2358.6989 <font color="grey"> | fax</font> +852
2358.1477<br>
<br>
<br>
<br>
Graeme Hirst wrote:
<blockquote
cite="mid:8A39C6C9-EB8A-4C97-9086-A487990941B6@cs.toronto.edu"
type="cite">
<pre wrap="">Also see Jörg Tiedemann's book "Bitext Alignment", which is about to be published (probably this week!) by Morgan & Claypool (morganclaypool.com) in their HLT Synthesis series. It includes a 45-page chapter on word alignment.
On 1 Jun 2011, at 10:58, Afsaneh Fazly wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Look at these two (among many others):
The Mathematics of Statistical Machine Translation: Parameter Estimation.
Peter E Brown, Vincent J. Della Pietra, Stephen A. Della Pietra,
Robert L. Mercer
Computational Linguistics, 1993.
The alignment template approach to statistical machine translation.
Franz Josef Och and Hermann Ney.
Computational Linguistics, 30:417–449. 2004.
On Wed, Jun 1, 2011 at 10:28 AM, Xu Jiajin <a class="moz-txt-link-rfc2396E" href="mailto:ustcxujj@gmail.com"><ustcxujj@gmail.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi all,
The other day, over an academic discussion, my colleague and I had a brief
debate on WORD ALIGNMENT, while we were talking about bilingual text
aligning practices. When I was reviewing different levels of alignment, such
as sentence alignment and word alignment, I commented that WORD ALIGNMENT IS
A JOKE, as it is never likely to aligning words. My comment was immediately
refuted by another professor. But I have not been convinced by his
counterargument so far.
In my mind, word alignment is not realistic since it’s impossible to find
one-to-one correspondence of parallel texts on the word level. For instance,
it’s most likely that words in a sentence are not translated, either kept
implicit or assimilated into other words, constructions, idioms and so
forth. I reckon it is also the case for parallel texts of cognate languages.
But on a second thought, the alignment of a selection of words, say, lexical
words, or jargons, across texts is not impossible. However, linguistic
alignment, as I see it, has to be exhaustive. In saying so, I actually
consider sentence alignment as the canonical type of text alignment. Each
sentence is aligned to one or more sentences in the target texts, and the
other way round.
I am wondering whether there ARE word alignment implementations in practice.
I would appreciate any pointers to relevant literature or tools, as well as
the clarification of the notion alignment.
Maybe due to my ignorance, word alignment has been a mature technology for
many years. Could anyone tell me what are main uses of word alignment?
Bilingual lexicon? Any other applications?
Thanks in advance.
Cheers,
Jiajin XU
Ph.D., associate professor (discourse studies, corpus linguistics)
National Research Centre for Foreign Language Education
Beijing Foreign Studies University
Beijing 100089
China
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<pre wrap=""><!---->
--
:::: Graeme Hirst
:::: University of Toronto * Department of Computer Science
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
</body>
</html>