WordNet
Doug Cooper
doug at th.net
Sat Feb 5 11:50:26 UTC 2000
>RMC writes:
>Right now I am stymied by the problem posed by root words...
>Wonder if someone in the forum has gone there, done that.
(...with apologies for sticking to the point. There is, btw, considerable
literature on wordnet in lexicography at the on-line WN bibliography:
http://www.let.uva.nl/~ewn/corebcs/topont.htm )
Yes, I'm in the thick of it for Thai right now. On the particular issue of
treating roots as a part of speech, I faced a similar set of problems, and
eventually decided that it mixes two distinct conceptual views, to wit:
WN/EWN are hierarchies of semantic and physical relations between
real-world concepts and objects, _not_ morphological relations between
roots and derived terms -- even though that info is sometimes recorded,
eg. between adjectives and adverbs; or inferred, as when plurals
are stemmed.
The EWN xpos_near_syn relation (a typical cross-POS relation)
is consistent with this view. It is defined roughly as: "if (something)
X's, then Y takes place" (eg. dies, death). While for EWN,
"preferably there is a morphological link between the two," the
morphological relation isn't being encoded. Rather, these are
two words that relate to the same event -- a relation that is useful
for, say, information retrieval (I may not be able to find a document
about John's death, but I may see a headline like 'John Dies').
If the simple fact that an adverb is derived from a particular adjective,
or a relation like xpos_near_syn, isn't sufficient, I assume that you want
to store information about how to generate derived forms from roots.
While this is entirely reasonable, a WordNet implemented in this way
would be something more along the lines of a generative lexicon.
Rather than storing (most) adverbs, the rule for making an adverb would
be part of the adjective entry, and we would lose the programming
advantages that come from having each node actually filled with all
its values. In WN, this stuff is external (eg. in the "morphy" tool).
Consider the problem of intensifiers. Do we make up a new semantic
relation is_intensified_by (that points to the intensifier), or do we add a
subordinate note that has the word+intensifier pair (as 'black' has hyponyms
jet black, pitch black, coal black, etc.)? WN (and I) take the second
position for the simple reason that once the subordinate node is
fully populated with additional values (like 'sable' or 'ebony'), we
know all the values in this synset explicitly.
That, as you point out, "you get more semantic relations than you
bargained for" is just a fact of life. That we don't incorporate lots of
language-specific info is also unavoidable, and IMHO is an issue
for separate tools. Using WN to manage sense definitions doesn't
rule out a 'morphNet' or 'derivNet' for other kinds of relations, and
would be cleaner for it.
Indeed, from my point of view the advantage of working with
WN is that I can use its tools and semantic hierarchy as a skeleton for
navigating a relatively poorly-lexicalized (or poorly documented)
L2 data space, with minimal coding. All I have to do is to tie each
Thai head/sense ID to the appropriate WN node; possibly extending
the implicit WN links; eg
"thai_1_1=>foo#1" 'is a member of the foo#1 synset';
"thai_1_2->foo#2" 'is a hyponym of the foo#2 synset'
"thai_1_3<-foo#3" 'is a member of foo#3 which, in Thai, should itself
be treated as a hyponym of its current English synset (because it's
more of a distinct, well-lexicalized concept)'
then write a little front end that incorporates this info into the
data sent to/returned by the vanilla command-line WN front end
(although you might find the perl toolset at
http://www.ai.mit.edu/~jrennie/WordNet/ useful).
Hope this is helpful,
Doug
__________________________________________________
1425 VP Tower, 21/45 Soi Chawakun
Rangnam Road, Rajthevi, Bangkok, 10400
doug at th.net (662) 246-8946 fax (662) 246-8789
Southeast Asian Software Research Center, Bangkok
http://seasrc.th.net --> SEASRC Web site
http://seasrc.th.net/sealang --> SEALANG Web site
More information about the An-lang
mailing list