[Corpora-List] Moving Lexical Semantics from Alchemy to Science

John F. Sowa sowa at bestweb.net
Fri Jan 28 17:29:26 UTC 2011


There are some fundamental principles about N-N compounds (and
some Adj-N compounds):

  1. The second noun is the "head". whose meaning is restricted
     by the first noun (or adjective).

  2. The only constraint on the kind of restriction is that there
     exists some relation between them.  The task is to determine
     that relation.

  3. For the "easy" cases, one or both of the words have an implicit
     pattern (call it a 'schema','frame', or whatever) in which the
     meaning of the other will fit.  Typical examples are nouns that
     have implicit relations in their definitions:  father, mother,
     pilot, driver, employee, food, payment.  Many of these examples
     are related to verbs, for which the case frame of the verb
     provides the slots in which the other noun can fit.

  4. For more difficult cases, the pattern that relates them is
     a longer phrase or sentence, in which neither word has a
     strong connection to the phrase:

     a) The system of planets that revolve around the sun (sol).

     b) A system for collecting energy from the sun (sol).

Dominic W:
> compounds have a well-known property of (usually) only taking
> on some of the available meanings.

Yes.  The historically first use of the phrase 'solar system'
makes other use confusing.  But the other use could have
become common.  Typical examples:  solar power, solar panel,
solar chips, solar heater.  The phrase 'solar system' could
easily have become a generic for all of those -- in fact,
a sufficient amount of advertising money might make that
interpretation more common than the astronomical one.

Ramesh K:
> Surely any multi-word item involving at least one polysemous
> element would be a candidate?

Actually, both words 'solar' and 'system' are being used in
the same sense in the compound 'solar system'.  The ambiguity
arises from the fact that they can be plausibly connected
by different phrases, such as (a) and (b) above.

RK:
> Then there’s the problem of segmentation/sequence, e.g. hot water tap

Many of these examples can be resolved by N-grams:  'hot water'
is much more common than 'water tap'.

Yorick:
> Im still obsessed with things like "rubber duck" (in the bath)
> doesnt go the same way as "rubber chicken" (banquet food, as well
> as being a comedy prop)--I suppose enough facts about the distribution
> of meats at banquets might make this predictable, but Im not confident.

I would use corpora frequency plus a semantic interpretation of the
passages in which the phrases are used.  If you discover that the
rubber chicken in banquets is actually being eaten, that would
indicate a metaphor.  And if you discover that rubber ducks are
used in bathtubs, that would indicate that they're made of rubber.

Statistics are useful for analyzing corpora, but the ultimate goal
of NLP is language understanding.  In cases where statistics are
ambiguous, use semantics to make the choice -- and vice versa.

Yorick:
> German is more fun than English because of its different compositionality habits

Yes.  One MT system analyzed 'Toiletteneingang' as Toilette-nein-gang
and produced the translation 'toilet denial procedure'.

That's an example where statistics would have improved the semantic
interpretation.

John

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list