WordNet
Chaumont Devin
devil at lava.net
Fri Feb 4 00:51:29 UTC 2000
"RMC" <rcena at epcor-group.com> writes:
>Chaumont, are we perhaps boring the rest of the group? How about taking
>it offline?
I would hope that a little exposure to what you are talking about might
help get some others thinking about the same problems. People can't learn
and grow without being willing to expose themselves to new ideas.
Going after austronesian languages through the ontology (WordNet, semantic
net, etc.) approach cannot help but be very fruitful in more ways than
one. The two main reasons for doing so that I can see right now are:
1. The ontology enables us to see a lot of things about the way people
within a particular culture perceive their world.
2. We need ontologies for these languages for purposes of machine
translation, which will be coming along in the next few years.
Probably the most important thing in all of this is to know what one is
doing. One of the reasons I cannot use WordNet in any practical way is
that it (at least the old version I downloaded) is shot through with
errors. If it were reliable, I could simply write software to translate
everything in WordNet to the format I wish to use for my parser, and
everything would work okay.
There are a lot of problems with all of this, but I think a couple might
be summed up as follows:
1. People tend to allow their own preconceptions about linguistic
phenomena to get in the way of real language analysis.
2. People make a lot of human errors when manually entering data.
The best way to deal with #1 is to gain a strong theoretical understanding
of what is going on (kinda sounds contradictory, doesn't it? But really
it's not.). And the best way to deal with #2 is to create software
capable of automatically gathering semantic information about particular
languages. As a simple example, in English, attributive adjectives come
just before nouns, so that it is not all that difficult to write software
capable of finding adjective-noun pairs. Such pairs will tell you a lot
about a language if the information that can be gathered from them can be
correctly entered in an ontology (by ontology, I mean a semantic network
like WordNet).
There is a very important assumption underlying all of this that goes
something like this:
All texts encode various things maintaining or else transitioning to
various states and the agents responsible for these maintenances and/or
changes.
Armed with this knowledge, we can begin to analyze how state "flows" from
words encoding states to words encoding things. A clear understanding of
this phenomenon of "state flow" is essential if one is to create reliable
and effective ontologies (effective meaning that they will hold enough
information to be of use later on in machine translation).
A good ontology will tell you not only the kinds of things WordNet can,
but also things like what subjects and objects can go with what verbs,
etc., so you should make sure the system you are using is capable of doing
all of this before getting started, otherwise you may have regrets later
on.
Anyhow, I am looking forward to seeing your results, because I am fluent
in Malay and Buru language, and I would imagine that a Tagalog world (a
tagalog ontology) will show many interesting conceptual similarities to
the worlds of these languages.
With best regards from Honolulu,
Chaumont Devin.
More information about the An-lang
mailing list