WordNet
RMC
rcena at epcor-group.com
Thu Feb 3 20:24:57 UTC 2000
Chaumont,
WordNet's justification for imposing parts of speech categorization is that
"fundamental
differences on the semantic organization of these syntactic categories can
be clearly seen
and systematically exploited." Nouns are organized as topical hierarchies,
verbs as
entailment relations, and adjectives as n-dimensional hyperspaces. At a
psychologically
plausible level, lexical memory can be characterized in terms of various
ways of
organizations that happen to attract certain syntactic categories of words,
but not others.
I tend to agree. My particular concern, however, is that, if we keep true to
WordNet's
ambition (mapping the organization of lexical knowledge and providing an
index to this
knowledge via semantic senses rather than through words in alphabetical
order), we may miss
a significant chunk of this knowledge in languages like Tagalog if the role
of roots words
in the lexical organization is ignored. I can see the Tagalog verb _kinain_,
the noun _kainAn_,
and the adjective _kai:nin_ lexically *relate* to the root _kain_ 'eat'.
Nothing prevents setting up
a new WordNet relation. Once again, some languages may differ from
English in the way things are organized in a fundamental way. English may
support an
analysis that relates the English adverb _beautifully_ to the adjective
_beautiful_, but I can't
see any justification for such direct cross-POS relationships between
Tagalog N-V-A with the same
root (Tagalog appears not to have made up its mind yet on whether to
maintain proper lexical
adverbs.)
My immediate problem is that I am doing the data model, and, from experience
(my day job is
designing computer applications), leaving out something as potentially
significant as a root
level and incorporating it later will require massive and costly redesign.
Chaumont, are we perhaps boring the rest of the group? How about taking it
offline?
Cheers!
rcena at epcor-group.com
----- Original Message -----
From: Chaumont Devin <devil at lava.net>
To: AUSTRONESIAN LANGUAGES AND LINGUISTICS <AN-LANG at anu.edu.au>
Sent: Wednesday, February 02, 2000 7:20 PM
Subject: Re: WordNet
> "RMC" <rcena at epcor-group.com> writes:
>
> >Thank you for your reply. I'm looking at George Miller's English WordNet
> as
> >a model, hoping to hook up to the English WordNet a Tagalog WordNet
> (check
> >out the English WordNet at http://www.cogsci.princeton.edu/~wn/).
>
> Yes, I have had a copy of this for some years.
>
> >Using
> >Miller's approach, the EuroWordNet Consortium developed not only
> >intralingual wordnets for a number of European languages but also
> provided
> >interlingual links (http://www.hum.uva.nl/~ewn/).
>
> I was not aware of this.
>
> >So perhaps -- dreaming
> >on -- one can *think* about an interlingual Philippine WordNet, who
> knows.
>
> This would probably be easier than you imagine, depending upon how the
> problem is approached.
>
> >Right now I am stymied by the problem posed by root words. The English
> >WordNet ignores the relationship between the noun house and the verb
> house.
>
> Right, and it is unable to forge links for this relationship because it is
> limited to a few semantic relation types (hypernymy, holonymy, synonymy,
> and antonymy, and perhaps one or two others).
>
> Here are the fundamental flaws of WordNet:
>
> 1. Separation of various parts of speech into different files instead of
> recognizing a semantic plane in which all semantic nodes reside.
>
> 2. No way to tailor part-of-speech to the requirements of other languages.
>
> 3. No way to add new parts of speech or semantic link types.
>
> 4. Very slow.
>
> Plus a few others, some too lengthy to describe here and some I have
> probably even forgotten.
>
> And yet semantic linkages can be tricky. You mention verb house and noun
> house, and how these two are linked. In order to get good results, people
> who thoroughly understand how such things work are required in order to
> ensure that they get entered correctly. For example, my experience has
> taught me that nouns and non-nouns link to different semantic nodes. Thus
> "house" as a noun does NOT link directly to the same semantic node as
> "house" as a verb. But internally the two semantic nodes do link in the
> following manner: The noun, "house" is employed in the action of the verb,
> to "house". Enough said. I will spare you further details here.
>
> >The EuroWordNet attempted to unify nouns and verbs, by introducing new
> >relations noun-to-verb hyperonym/hyponym, v-to-n hyper/hypo, etc.
>
> A lot of hard research says that this is a mistake. Noun nodes do occupy
> the same semantic plane as verb nodes, but hypernymy NEVER exists between
> them. Non-noun hypernyms are also non-nouns, and noun hypernyms are
> nouns. I can't go into the reasons here, but if you are really
> interested, I will spell it all out in a separate message.
>
> >Add adjective and you've got more relations than you bargained for.
>
> "Adjective" is a part of speech, and not a semantic relation. The
> important thing is that in the ontology nouns and non-nouns (verbs,
> prepositions, adjectives, and adverbs) be handled separately. The reason
> for this is something I have called "state flow", which I could also
> explain in a separate message.
>
> Neither approach *derives* the noun and the verb from the *root* house.
>
> The two "house" words do not in fact derive from the same root
> semantically. The noun is a thing having a thing and walls. The other
> encodes transition to a state (unhoused to housed), which employs the
> thing as a covering. They are linked semantically, but they do not derive
> from a common semantic root.
>
> >In a Tagalog
> >Wordnet it may make more sense to treat roots as a part of speech for
> this
> >purpose, and to maintain a lexical relation to derived words. Wonder if
> >someone in the forum has gone there, done that.
>
> This is very interesting stuff. Malay prefixes and suffixes are VERY
> regular, and yet cannot be relied upon 100% to guarantee that the derived
> word will link to the same semantic root (semantic node) as the stem word.
> A funny story about this comes out of the Moluccas. There was this
> missionary leading an evangelistic service in a church in Ternate in the
> 1960s. He was leading the testimonies, and urging people to get up and
> testify for Jesus Christ. "What's the matter with you people?" he roared.
> "Do your genitals have you bound to the benches?" The problem was that
> "malu" means shy, but "kemaluan" does not mean "shyness", as would be
> expected, but "genitalia"! That missionary almost went home several times
> because of difficulties like that, but the last time I heard he was still
> in Indonesia, and presumably speaking the language somewhat better.
>
> With best regards from Honolulu,
> Chaumont Devin.
>
More information about the An-lang
mailing list