A simpler format for OLAC vocabularies and schemes

Jeff Good jcgood at SOCRATES.BERKELEY.EDU
Tue Sep 24 23:23:50 UTC 2002


Hello,

I wanted to say that I think the basic designs of the revisions proposed
by Steven and Gary are very good suggestions. I completely agree with
Gary Holton's points--so I won't repeat them.

I thought I'd point out how I think these revisions can be usefully
applied to some problems that the working group evaluating the
linguistic types document. I think this new format will allow us to get
past many issues which I thought may have been intractable. I guess I
consider this to be a good "empirical" test of the proposal.

The specific problem was that there are many cross-cutting ways to
classify the "type" of a linguistic document. There's a sense in which a
document focuses on a big sub-field of linguistics like phonology,
morphology, etc. There's the basic structure of a document: dictionary,
grammar, text (the term "macrostructure" can be used to describe this
category). And then there are important "meso/micro-structure" aspects
of documents---like the type of transcription used (free translation,
interlinear, etc.)

The original OLAC system encouraged us to create an ontology of document
types which assumed that there was one "type" for a document, when, in
reality, type is a multi-dimensional concept. As we realized this, we
started to break down the types into the most important dimensions--like
linguistic subject, basic structure, etc. But even then, there were
problems of classification. For example, categories like "oratory",
"narrative", "ludic" seemed appropriate for some linguistic
documents--but it isn't immediately clear where they belong in a
hierarchy of types (are they structural or content types? or are they
something else?).

It was possible to create a system of types which works, but I think
many of our conceptual and implementational problems can be more cleanly
solved by the new systems because of it extensibility.

Specifically, rather than having to pigeonhole types into a few
categories in a hierarchy, we can just propose a series of vocabularies
corresponding to the potentially independent "type" parameters of a
document--for example, a linguistic subject vocabulary, a document
structural type vocabulary, a "discourse"-type vocabulary for things
like "oratory" and "narrative". (For more detail on this, there are
relevant recent posts, one from me, on the Metadata list.)

Over time, I'm sure we'll find some of the vocabularies are more
useful/used than others--and these can become OLAC recommended standard
vocabularies. I think the real value of the new system will be that it
is much more forgiving/flexible if we find we need to adapt our "type"
categories in the future.

Since Steven just posted about the idea that vocabularies be recommended
practices, I'll say that I think that aspect of the proposal is also
very helpful to working out a linguistic type vocabulary. One thing that
at least I am convinced of in the discussion of "types" is that there is
a counterexample to every generalization you can make about them. It may
be the case that some counterexamples are minor enough that we can get
away without a good classification for them. Or it might be the case
that a counterexample is revealing a set of important omissions in the
proposals. It's hard to tell without testing a lot of archives.

A recommended, but not enforced, vocabulary would address this
problem--as archivers encounter situations that aren't covered, they
wouldn't be forced to "fit" their document into a category where it
doesn't belong. This would not only promote the creation of needed new
vocabulary items but also maintain the integrity of existing ones.

Additionally, the idea of recommended vocabularies, plus a best practice
standard, certainly is more in line with the general spirit of OLAC, and
I think it would encourage more subcommunities to get involved and
create vocabularies which they need.

Jeff



More information about the Olac-implementers mailing list