[Corpora-List] Man bites dog

Jimmy O'Regan joregan at gmail.com
Mon Nov 21 12:01:59 UTC 2011


On 21 November 2011 03:15, Mike Maxwell <maxwell at umiacs.umd.edu> wrote:
> In LILT 6 (http://elanguage.net/journals/index.php/lilt/issue/current),
> "Zipf's Law and l'Arbitraire du Signe," Martin Kay discusses statistical MT,
> and says (p.22):
>
>   Notice that a language model would, and should, guarantee
>   that the French “homme mord chien” would be translated into
>   English as “dog bites man”, rather than “man bites dog”,
>   which is what it really means.
>
> I once proposed this exact example (with Spanish rather than French) to a
> computational linguist who knew more about MT than I do.  (People who know
> more about MT than I do are quite common.  Ok, they're quite common among
> computational linguists :-).)  That person suggested I needed to learn more
> about MT.
>
> It would be nice to find myself making the same mistake that Martin Kay
> made.  It would be even nicer if it weren't a mistake.
>
> Is Kay's claim correct?  The context is of course pure statistical MT, not
> hybrid rule/ statistical systems.  Assume that the pair "homme mord chien"/
> "man bites dog" never occurs in the training data, but that the reverse does
> (or at least that "dog bites man" appears on the English side, presumably
> with some significant frequency).

That idea overlooks how statistical reordering works, and assumes a
'bag of words' based method; it also presumes that the bigrams 'man
bites' and 'bites dog' never occur. More importantly, it assumes that
'dog bites man' is a more frequent trigram in English (i.e., the
target language model), which doesn't seem to be true
(http://books.google.com/ngrams/graph?content=man+bites+dog%2C+dog+bites+man&year_start=1800&year_end=2000&corpus=0&smoothing=3):
which makes sense in hindsight, when you consider the idiomatic value
of 'man bites dog'.

It has a sort of metaphorical truth, regarding SMT's difficulties with
novelty, but it's not literally true - file it away with 'the meat is
rotten, but the vodka is good' :).

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list