[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

Michael Maxwell maxwell at umiacs.umd.edu
Mon Sep 10 13:36:25 UTC 2007


I wrote:
>> If the grammar under-generates... It may also
>> be a lexical problem, e.g. a word missing from the lexicon
>> (the "klatu verata nikto" problem).

Clai Rice asked:
> Mike, I'm familiar with the phrase "klatu verata nikto" but
> not with its use in this context. Do you have a reference or
> is this an informal name?

Purely informal, and probably too much so :-).

What I meant was, there are certain contexts where you can insert
*anything*: words in a foreign language (the foreign language in my
example was an alien language, from the movie "The Day the Earth Stood
Still"), non-words (the pirate exclamation "Aargh!", or animal sounds), or
even hand motions.  These contexts are typically direct quote contexts,
like "Then she said, 'Klatu verata nikto.'"  Obviously these will prevent
a parser from obtaining a parse, unless it has some sort of fall-back
mechanism.

More generally--and less jokingly--the problem for parsers is that of "out
of vocabulary" (OOV) words.  Often these are proper nouns, although they
can be misspellings, newly coined terms, acronyms, abbreviations, or just
words that the lexicon compiler overlooked.  The point is that a fall-back
mechanism for OOV words is not part of the grammar per se, it's some sort
of external mechanism brought in for just such a purpose.  (I presume, but
of course don't know, that this is as true for humans as it would be for
computers.)

Sorry for the unclear allusion!

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list