A simpler format for OLAC vocabularies and schemes

Helen Aristar Dry hdry at LINGUISTLIST.ORG
Fri Sep 27 16:46:45 UTC 2002


Hi, Gary,

Yes, I take your point that we can't force compliance; and, in
general, I'd be all for letting standards evolve from usage.  But
actually, from the point of view of the LINGUIST service provider,
the languages example isn't a heartening one.  What our
programmer had to do to  search harvested OLAC metadata by
subject language is write a special program that translates any text
entry in the subject language field into the SIL code.   This is
possible to do with languages  only because we have the
Ethnologue name and alternate name tables on the site, and
therefore we have a list of almost all the language names that any
site might be using.  It's still a lot of work, and we're no doubt
missing or misclassifying the subject languages of a lot of records.
Nevertheless, we do have a search engine that is using Ethnologue
codes to identify resources by subject.language, thereby
demonstrating the utility of this recommendation.

But what are we going to do for linguistic data type and all the
other erstwhile controlled vocabularies?? There's no "alternate
name" reference for extensions (at least not as far as I know), such
that we could use it to write a translation program . .  even if it were
feasible to translate every relevant value in every metadata record.
And it makes no sense to set up search facilities that use the
recommended vocabulary if  there's no data classified by it--getting
a lot of "not found" messages will discourage users from using the
recommended vocabulary, not encourage it.  So our search engine
is not going to be any help in promulgating these recommendations.

Sigh.  I realize that mandating a controlled vocabulary wouldn't
ensure that archives used it.  Perhaps it would give them a little
more impetus, however.  And it would certainly be nice if each
archive would "translate" its user-defined metadata into the
recommended OLAC vocabulary, rather than leaving the service
provider to figure out how to do it  for multiple archives, each with
its own idiosyncratic and undocumented set of extensions.
I'm still hoping that you and Steven will come up with some bright
ideas about how to help/encourage/convince archives to do this . . .

Sorry to be negative.  You know I think OLAC is the best thing
since sliced bread. . . . I'm just having some trouble figuring out
how we're going to cope with the new-fangled slices....

All the best,
-Helen


Date sent:      	Thu, 26 Sep 2002 19:24:30 -0500
Send reply to:  	Open Language Archives Community Implementers List
             	<OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG>
From:           	Gary Simons <Gary_Simons at SIL.ORG>
Subject:        	Re: A simpler format for OLAC vocabularies and schemes
To:             	OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG

> Helen,
>
> You hit the nail on the head when you observe: "in that case, I don't see
> the real difference between recommendations and a centrally validated
> standard".  It was that same observation, but coming from the point of view
> of our status quo, that has been a key part of the motivation as Steven and
> I have been thinking about what our version 1.0 standard should look like.
>
> In version 0.4 we have a centrally validated and mandated standard, but it
> has built-in optionality.  For instance, it is our standard to use SIL and
> Linguist codes to identify languages precisely, but data providers also
> have the option of just providing free text.  Thus the standard is
> currently not requiring language codes but only recommending them as best
> practice, and an examination of the harvested records from our 20 or so
> participating data providers reveals the fact that many sites are not now
> using codes.
>
> Our proposal to take the controlled vocabularies out of the standard and to
> treat them as best practice recommendations thus does not really change the
> current reality.  In fact, it probably gives a better reflection of the
> reality. One key advantage from the point of view of managing the
> infrastructure is that it will not be necessary to change the standard when
> controlled vocabularies are changed or added.  The metadata standard would
> just specify the structure of the container record and the mechanism for
> defining metadata extensions and would be very static.  Each controlled
> vocabulary would be managed separately in an independent document and in a
> formal extension definition that would supply downloadable code sets so
> that extension data can still be centrally validated.  When the community
> reaches a consensus that a particular vocabulary should be used when
> applicable, then it would become a community Recommendation and our default
> harvester would support it. Service providers would exploit it (such as
> Linguist is now doing with searching by language) and that would show data
> providers who are not yet using the vocabulary the benefits of using it.
> We could even have a "Recommended practice report card" that would show
> which recommended extensions an archive is using and which it is not.
>
> Thus Steven and I are assuming that the end result of this change would not
> weaken compliance to standardized vocabularies (which is already optional),
> but that it would make it much easier to manage changes to vocabularies and
> to experiment with specialized vocabularies.
>
> I hope that helps to clarify where we are coming from.
>
> -Gary Simons
>
>
>
>
>
>                       Helen Dry <hdry at LINGUISTLIST.ORG>
>                       Sent by: OLAC Implementers List            To:
>                       <OLAC-IMPLEMENTERS at LISTSERV.LINGUI         OLAC-IMPLEMENTERS at LISTSERV.LINGU
ISTLIST
>                       STLIST.ORG>                                .ORG
>                                                                  cc:
>                                                                  Subject: Re: A simpler format fo
r OLAC
>                       09/26/02 06:36 PM                          vocabularies and schemes
>                       Please respond to Open Language
>                       Archives Community Implementers
>                       List
>
>
>
>
>
> Hi, Gary (and everyone),
>
> I've just sent a long posting to the list explaining some of my problems
> with Steven's
> & Gary's proposal, so all I want to do here is respond briefly. I
> completely agree with
> your point about the value of syntactic simplification.  But I'm not sure
> about the
> second point--reducing all OLAC vocabularies to recommendations.  It's
> interesting
> where our opinions diverge--i.e., you see the benefits to the archive,
> which may
> already have a user-defined scheme, and I see the possible problems for the
> general service provider, which may not be able to handle multiple
> user-defined
> schemes in an efficient way.  Perhaps OLAC can handle this problem by
> making
> STRONG recommendations . . . but in that case, I don't see the real
> difference
> between recommendations and a centrally validated standard . . . except for
> the fact
> that OLAC wouldn't have to re-publish all the metadata whenever a
> recommendation changed.  I suppose this would be an administrative
> advantage--
> but enough of a one to lose the potential benefits of standardization???
> I'm waiting
> to be convinced....
>
> -Helen
>
>
>
> On 24 Sep 2002 at 10:07, Gary Holton wrote:
>
> On Mon, 16 Sep 2002 17:39:54 EDT, Steven Bird <sb at UNAGI.CIS.UPENN.EDU>
> wrote:
> >--
> >
> >So, what do you think?  Do you agree with our proposals for
> >(i) a syntactic simplification in our XML representation, and
> >(ii) switching OLAC vocabularies from being centrally validated
> >standards to recommendations?  We would welcome your feedback.
> >
>
>
> Dear Steven & Gary,
>
> I haven't had much time to digest your proposal, but my initial reaction is
> very positive. Regarding (i), it is clear that a syntactic simplification
> is needed. I for one have never been able to keep straight refinements vs.
> schemes, and I don't think I'm alone here. And as you point out (ii), the
> real issue should be not whether a particular refinement (and associated
> vocabulary) has been officially adopted (mandated?), but rather whether a
> such a refinement is useful to the community. We can debate ontologies, but
> it is more difficult to debate usefulness without actually implementing a
> refinement. Your proposal would permit refinements ("extensions") to fit
> the needs of the community, so that useful solutions could evolve.
>
> I have often approached the metadata issue by trying to imagine what types
> of refinements and vocabularies would be useful to the end user. The
> difficulty is that we don't know enough about how the user will be
> searching, what they will be searching for, and what types of search
> facilities they will have. The best we can do at this point is make an
> educated guess and then watch closely to see how the refinements and
> vocabularies are actually used. That said, I think we have some very good
> guesses already and will certainly be able to recommend best practices by
> December. However, if we lock in the vocabularies then most archives will
> continue to have to support both an OLAC schema and a user-defined schema
> (as you point out). This would essentially remove the data provider from
> the loop, in that user-defined schemas would be viewed as idiosyncratic and
> non-standard. Allowing user-defined "extensions" would encourage innovation
> on the part of both data and service providers--innovation mediated by the
> end user.
>
> Any reactions from others?
>
> Gary Holton



More information about the Olac-implementers mailing list