A simpler format for OLAC vocabularies and schemes

Fri Sep 27 00:24:30 UTC 2002

Helen,

You hit the nail on the head when you observe: "in that case, I don't see
the real difference between recommendations and a centrally validated
standard".  It was that same observation, but coming from the point of view
of our status quo, that has been a key part of the motivation as Steven and
I have been thinking about what our version 1.0 standard should look like.

In version 0.4 we have a centrally validated and mandated standard, but it
has built-in optionality.  For instance, it is our standard to use SIL and
Linguist codes to identify languages precisely, but data providers also
have the option of just providing free text.  Thus the standard is
currently not requiring language codes but only recommending them as best
practice, and an examination of the harvested records from our 20 or so
participating data providers reveals the fact that many sites are not now
using codes.

Our proposal to take the controlled vocabularies out of the standard and to
treat them as best practice recommendations thus does not really change the
current reality.  In fact, it probably gives a better reflection of the
reality. One key advantage from the point of view of managing the
infrastructure is that it will not be necessary to change the standard when
controlled vocabularies are changed or added.  The metadata standard would
just specify the structure of the container record and the mechanism for
defining metadata extensions and would be very static.  Each controlled
vocabulary would be managed separately in an independent document and in a
formal extension definition that would supply downloadable code sets so
that extension data can still be centrally validated.  When the community
reaches a consensus that a particular vocabulary should be used when
applicable, then it would become a community Recommendation and our default
harvester would support it. Service providers would exploit it (such as
Linguist is now doing with searching by language) and that would show data
providers who are not yet using the vocabulary the benefits of using it.
We could even have a "Recommended practice report card" that would show
which recommended extensions an archive is using and which it is not.

Thus Steven and I are assuming that the end result of this change would not
weaken compliance to standardized vocabularies (which is already optional),
but that it would make it much easier to manage changes to vocabularies and
to experiment with specialized vocabularies.

I hope that helps to clarify where we are coming from.

-Gary Simons

                      Helen Dry <hdry at LINGUISTLIST.ORG>
                      Sent by: OLAC Implementers List            To:
                      <OLAC-IMPLEMENTERS at LISTSERV.LINGUI         OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST
                      STLIST.ORG>                                .ORG
                                                                 cc:
                                                                 Subject: Re: A simpler format for OLAC
                      09/26/02 06:36 PM                          vocabularies and schemes
                      Please respond to Open Language
                      Archives Community Implementers
                      List

Hi, Gary (and everyone),

I've just sent a long posting to the list explaining some of my problems
with Steven's
& Gary's proposal, so all I want to do here is respond briefly. I
completely agree with
your point about the value of syntactic simplification.  But I'm not sure
about the
second point--reducing all OLAC vocabularies to recommendations.  It's
interesting
where our opinions diverge--i.e., you see the benefits to the archive,
which may
already have a user-defined scheme, and I see the possible problems for the
general service provider, which may not be able to handle multiple
user-defined
schemes in an efficient way.  Perhaps OLAC can handle this problem by
making
STRONG recommendations . . . but in that case, I don't see the real
difference
between recommendations and a centrally validated standard . . . except for
the fact
that OLAC wouldn't have to re-publish all the metadata whenever a
recommendation changed.  I suppose this would be an administrative
advantage--
but enough of a one to lose the potential benefits of standardization???
I'm waiting
to be convinced....

-Helen

On 24 Sep 2002 at 10:07, Gary Holton wrote:

On Mon, 16 Sep 2002 17:39:54 EDT, Steven Bird <sb at UNAGI.CIS.UPENN.EDU>
wrote:
>--
>
>So, what do you think?  Do you agree with our proposals for
>(i) a syntactic simplification in our XML representation, and
>(ii) switching OLAC vocabularies from being centrally validated
>standards to recommendations?  We would welcome your feedback.
>

Dear Steven & Gary,

I haven't had much time to digest your proposal, but my initial reaction is
very positive. Regarding (i), it is clear that a syntactic simplification
is needed. I for one have never been able to keep straight refinements vs.
schemes, and I don't think I'm alone here. And as you point out (ii), the
real issue should be not whether a particular refinement (and associated
vocabulary) has been officially adopted (mandated?), but rather whether a
such a refinement is useful to the community. We can debate ontologies, but
it is more difficult to debate usefulness without actually implementing a
refinement. Your proposal would permit refinements ("extensions") to fit
the needs of the community, so that useful solutions could evolve.

I have often approached the metadata issue by trying to imagine what types
of refinements and vocabularies would be useful to the end user. The
difficulty is that we don't know enough about how the user will be
searching, what they will be searching for, and what types of search
facilities they will have. The best we can do at this point is make an
educated guess and then watch closely to see how the refinements and
vocabularies are actually used. That said, I think we have some very good
guesses already and will certainly be able to recommend best practices by
December. However, if we lock in the vocabularies then most archives will
continue to have to support both an OLAC schema and a user-defined schema
(as you point out). This would essentially remove the data provider from
the loop, in that user-defined schemas would be viewed as idiosyncratic and
non-standard. Allowing user-defined "extensions" would encourage innovation
on the part of both data and service providers--innovation mediated by the
end user.

Any reactions from others?

Gary Holton