[Corpora-List] dictionary definitions to glosses

Ken Litkowski ken at clres.com
Tue Dec 9 19:58:55 UTC 2003


Mike Maxwell wrote:

> Has anyone seen any work on reducing dictionary-style definitions to
> simple(r) glosses?
>


Even with the caveat noted in your followup message about working with
bilingual lexicons, the very idea raises my hackles.  Dictionary
definitions should be well crafted, so that even we computationalists
can be quite precise in exploiting their meanings.  WordNet has
frequently been criticized for its glosses (initially developed, justly
so, only as reminders for its developers), and WN 2.0 has made
substantial changes.  As for bilingual lexicons, it surely has been the
case that the paper format places severe space constraints on what can
be included, so that they are frequently just as cryptic and difficult
to use for "meaning-full" use; the electronic format may help in this
regard, allowing for a fuller understanding.  (Also, there are the
learners' dictionaries, which show how important it can be to provide
fuller understanding.)

The hackles now having been lowered, perhaps the main issue is how you
expect to use gisted definitions.  It would seem that a more descriptive
context would help define the task better to meet your needs.


> For example, the definition
>
>     act or process of shrinking, esp in wood; shrinkage.
>
> might reduce to 'shrinkage', and
>
>     bother; disturbance or interruption.
>
> might similarly reduce to any one of the three content words.  In some
> cases, more than one word might be output:
>
>     to carry a canoe
>
> should probably reduce to 'carry canoe', not just 'carry' or 'canoe'.
>


What are the words being defined here?  Those words themselves are the
"gist" of the definitions, probably better than any other choices you
might make.  So, again, what are you trying to accomplish.


> I can think of some heuristics, e.g. choose the least common word (in some
> sense of 'common'), but if the chosen word is the object of a verb, retain
> the verb also.  (Which requires some parsing--fortunately, verbs in English
> definitions are usually preceded by the word 'to', I suspect, so
> distinguishing verbs from nouns should not be all that difficult.)
>
> I suppose this may be related to text summarization work.
>


You are correct in suggesting the use of heuristics.  A Perl script for
this purpose can be readily developed.  Of course, it requires that you
closely examine what you want to purge.  A colleague lexicographer
developed such a script (only a few hundred lines) stripping down full
blown definitions for one of the best dictionaries on the market in
order to assess "similarity" between definitions for fitting word senses
underneath the tops of WordNet, distinguishing possible hypernyms and
the remaining "content-full" words.  Just a matter of slogging through it.

As to your summarization analogy, the best team in DUC 2003 for headline
generation (less than 10 words) followed a similar approach to Radev et
al., via a process of removing "less important fragments" from sentences
viewed as most expressing the content of newswire texts.  This did
involve working with full parses of the sentences.  I would suggest
looking at the DUC 2003 papers
(http://www-nlpir.nist.gov/projects/duc/pubs.html).

	Ken


--
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com



More information about the Corpora mailing list