A Request for Agreement on Standardization

Chaumont Devin devil at lava.net
Sat Mar 11 19:48:51 UTC 2000


Dear Austronesian Linguists,

I suppose many of you have seen parts of the reconstruction Robert Blust
has been doing for Proto Austronesian.  In it you will notice that for a
given proto form, the glosses for reflexes in several proto languages will
be either 100% identical or else nearly the same.

I have been trying to get people interested in cooperating upon an
ontology (semantic network) for Austronesian languages, which would be a
very powerful tool for linguists, both in order to analyze available data
and in order to know what to look for in the future.  In order to
cooperate on building such a reference, it will be necessary that all of
us be "using the same language".  That is, these glosses will have to be
assigned identifying integer values in the range 256-8000 or so, and this
numbering system will have to be something that all of us can live with.

The way this will work is, supposing you tell your computer you are
interested in semantic node #2185.  Your computer will then display the
number, 2185, and a gloss associated with this node.  Then it will display
all the words in all Austronesian languages that link to this node.  This
will probably be too many, if we ever get all of the Austronesian
languages defined, so you will have to tell your computer ahead of time
what subset you are interested in.  These words will be displayed with the
parts of speech types that link them to semantic node #2185.  Then, if
#2185 is a verb, it will display the various kinds of agents and patients
that can be related through #2185, as well as beneficiary types, and other
appropriate information.  There won't be too many of these, since for each
on the highest hypernym will be selected.  For example, supposing the
gloss for #2185 meant "fly like a bird".  Then the system would not show
ducks, gese, eagles, chickens, etc., but only "bird" as potential agent.
And then, because "fly like a bird" is a kind of "fly through the air",
which might also have a hyponym of "fly like an airplane", there would
also be a hypernym link to "fly through the air". Etc.

But for many people to cooperate in order to provide all the words linking
to #2185 from all Austronesian languages, it will be necessary for us to
agree beforehand what the gloss for #2185 will be, and this is what I am
after right now. Yesterday I discussed the problem with Dr. Cecil H.
Brown, and the day or two before with Dr. Blust himself (who is not very
keen on this idea of mine).  As it turns out, apparently nobody has ever
attempted an analysis of the core glosses for Austronesian languages
before, and not many attempts have been made for ANY language. Dr. Brown
did indicate a book comparing glosses across several European languages,
and these were even numbered, but unfortunately, the numbering system
included decimal points, which would not be good for our purposes.  Also,
the glosses themselves tend not to coincide with glosses common for
Austronesian languages.

So what I need from you is agreement upon a set of glosses we can use in
order to create an ontology for Austronesian languages, and this will
require agreement upon the following questions:

1. How can we arrive at a set of glosses suitable for this task?  Of
course it will not be possible to get every possible gloss for every
possible word in every possible Austronesian language in one fell swoop.
What we need is a list of the most common ones which can be used for the
core of our ontology.  Then, for example, suppose we already have a gloss
for "fly like a bird", and then some researcher from some remote region in
Melanesia discovers a "fly like a bird with broken wing" gloss.  This can
then easily be added to the ontology by providing its semantic node with a
hypernym link to the node for "fly like a bird".  So what we are looking
for now is just a core list.

2. Is there any numeric ordering which would be preferable over any other?
 For computational purposes, of course, the best solution is to put the
most common closer to the top, but this may not be good for all purposes,
and we need to know in advance whether people may have any objections as
to ordering.

I would very much like to see this project get under way because it could
provide us with all kinds of interesting reports and information.  For
example, computer algorithms might be written capable of coming up with
proto forms for words based upon comparisons of words found in modern
languages.  Then it would be possible to ask for reconstruction reports
based on thousands of different combinations of languages for various
purposes.  I have already pointed out how such a system would be able to
answer queries such as those in the recent postings about the Malay word,
"anu".  And these things are only the tip of the iceberg.  The most
fascinating possibilities include natural-language computer interfaces for
any Austronesian language, and machine translation between Austronesian
languages.  These latter would not be possible in the immediate future,
but would be possible based on this work at some later time.  Then some
New Guinea highlander would be able to come home to his hut, lay down his
bow and arrows, get onto his computer, and give us a report on his hunting
experiences of the day, and ask what the weather was like in Fairbanks,
Alaska, and we would be able to read his messages in English, and he would
be able to read our responses in Fore (except that as luck would have it,
Fore is probably not an Austronesian language!  But you know what I mean).
 What we are talking about are incredible new linguistic and
communications possibilities during Century 21.

And one of the great things about this is that you probably don't need to
invest anything except some of your time to take part, because once we can
agree upon a set of core glosses, these can be distributed, and then in
order to contribute for whatever language, all you will need do for the
most part is just to write lines like the following and send them in to
the project coordinator:

manu sng 1935 122
fiti vrb 2917 122
nangu vrb 3581 122
Etc.

Here the first number is the semantic node identifier which you have
looked up for the gloss you associate with each word, and 122 is your
personal ID code or else the ID code for the person who originally
contributed the word, in other words, the author of the original
dictionary or word list from which you have copied the word, or whatever.
The project director will then feed this information into his/her
computer, where these ID codes and the part-of-speech mnemonics will be
converted into internal linkages, and the words you have contributed will
become part of the developing ontology, the latest version of which will
be available for the cost of shipping plus something for computer disk and
time required for writing to disk.

I hope I have given a clear sketch of what is required and some of the
results we might expect from this work.  Any questions or suggestions
would be welcome.

With best regards from Honolulu,
Chaumont Devin.

PS In the above I forgot to mention an integer for language ID, which will
also be needed by the computer.  However, this and the contributor's ID
can easily be added at the end of all lines by any decent text editor, and
so need not be typed every time.



More information about the An-lang mailing list