Reminder: Call for review of new metadata documents

Jeff Good jcgood at BUFFALO.EDU
Sat Mar 29 20:15:30 UTC 2008


Dear Gary (and others),

Thanks a lot for that clarification. The relationship between the  
mission statement and the granularity guidelines is much clearer to me  
now. I agree with Helen that the reasoning should be made explicit  
somewhere more prominent than this list. Your interpretation is not  
how I have interpreted the mission statement largely because I missed  
out on the significance of the word "library". I also find "virtual  
library" to be ambiguous between what one might call a "digital  
library" and what one might call an "aggregated library" (the latter  
sense being my label for your understanding of the OLAC use).

I think it might be worth adding two questions to the FAQ (or  
answering these questions in some other appropriate place): (i) What  
does OLAC mean by "virtual library"? and (ii) What does OLAC mean by  
"language archive"? That should help a lot with possible ambiguities  
in the mission statement.

An important open issue, which is still not clear to me from your  
explanation is whether OLAC is focusing on a "card catalog" now  
because that's all OLAC ever sees itself doing or if it, instead,  
views getting the card catalog part right as the first step towards a  
deeper kind of interoperability. (My reading of the mission would be  
that the latter interpretation is correct, but I already missed out on  
the importance of "library" in the mission. So, I'm probably missing  
several other points. I think the crucial point in this regard is  
understanding what the level of interoperability one hopes to achieve  
with respect to the  "interoperating repositories".) I don't think  
this is a merely pedantic issue right now because it matters a lot for  
how we "advertise" OLAC. Do we say, "OLAC is all about search!" (my  
simplification of something Helen said) or do we say, "OLAC aims for  
digital linguistic utopia starting with search!". (For what it's  
worth, I don't really care strongly about which path OLAC takes, but I  
would like to be confident I'm describing OLAC's goals correctly to  
other people.)


> aggregated catalogs, and (3) maintaining a catalog to help users find
> resources.  The OLAC metadata standard is, of course, the  
> specification for
> how to create an entry for the catalog.

I'm actually somewhat confused by the fact that you say OLAC is  
maintaining a catalog of resources. It was my understanding that OLAC  
is right now only maintaining one kind of "catalog", but not one of  
resources. Rather, it maintains a list of participating archives. The  
full catalogs of resources (for linguists, at least) are maintained by  
the two service providers: LINGUIST and the LDC. (I know there are  
lots of connections between OLAC and these catalogs, but, strictly  
speaking, I didn't think OLAC was in the catalog maintenance business  
but, rather, defined a way through which a catalog could be maintained  
by outside parties.)


> book, rather than an individual recorded session.  Thus I think this
> interpretation of desired granularity is straightforwardly implied  
> by the
> OLAC mission of creating a virtual library.  If you have any ideas of
> specific wording changes in the granularity guidelines that might  
> help to
> clarify this, I'll be glad to hear them.

I think your response already has all of the required points. Helen  
seemed to suggest adding an explanation to the mission statement. I'll  
let you and Steven decide if that's appropriate. (I'm not sure what  
the process is for adding explanations to the mission statement.)

With respect to the guidelines, I recommend changing the first  
paragraph of the granularity discussion to something like the  
following (based on my understanding of your explanation):

"Determining the right level for units to be described as language  
resources in the OLAC context involves multiple factors. The level of  
unit appropriate for inclusion in an aggregated catalog like OLAC's  
may be different (typically higher) than the level desirable for the  
catalog of a specific institution's holdings, which in turn is  
typically higher than the level desirable for describing the detailed  
contents of a resource. Consistent with its mission to create a  
virtual _library_ of language resources, a basic rule of thumb for  
making determinations regarding what kinds of units to treat as  
language resources should be that they should be comparable to the  
kinds of units treated as resources in a traditional library catalog.  
For example, libraries typically assign a single record to each book,  
not to each chapter within a book. A parallel example in the OLAC  
context would be treating all the objects associated with a particular  
field trip as a single unit rather than treating each of the  
individual resources created during that field trip as separate units.  
The following discussion is aimed at assisting an OLAC participant to  
find the right level of description."

It might the be nice to give lots of concrete examples, maybe you  
could get some of the participating archives to do this?

One thing I deleted from that paragraph was reference to the  
recommendation given in the Repository guidelines:
"A metadata repository should not degrade the 'signal-to-noise ratio'  
for language resource discovery."

I don't find this recommendation very helpful because (for me, at  
least) it is too dependent on what kinds of resources I want to  
discover. In other words, "language resource discovery" is too broad  
an activity for there to be one "signal-to-noise ratio". For example,  
if I already know I'm looking for resources on Nahuatl, I would  
probably not want to find a record saying, "There's a bunch of  
material on Nahuatl that's part of some bundle over at AILLA." The  
signal would be too weak for me--what I'd prefer is the search result  
I'd get from AILLA's catalog. Of course, for the next person, lots of  
detailed records about Nahuatl would constitute "noise". Signal and  
noise just don't strike me as constant enough to form the basis of a  
recommendation.

I also don't like that this recommendation privileges language  
resource _discovery_ over other possible uses of the catalog. For  
example, library catalogs have at least one other function in addition  
to discovery: retrieval. Often I know a resource exists, but I don't  
know where it is, which is why I consult the catalog (this is my  
primary use of WorldCat, for example). (The word "discovery" is  
potentially ambiguous enough to cover "find something previously  
unknown" and "retrieve", but that's not my initial reading.) So, I  
would prefer a recommendation that was more agnostic regarding the use  
of the metadata.

I personally find your new discussion of provenance in the metadata  
usage guidelines much more helpful than 'signal-to-noise ratio', since  
it's not dependent on particular uses of OLAC service providers. So,  
I'd actually recommend the following revision to the repository  
guidelines regarding granularity from the present recommendation to  
something like:

"A metadata repository should treat resources with a single provenance  
as constituting a single unit with respect to OLAC metadata and  
should, therefore, describe them within a single record."

Another advantage to talking about granularity in terms of provenance  
in my view is that the current guidelines seem to be asking data  
providers to hypothesize about what search scenarios their data will  
be put to, but I don't think it's reasonable to expect data providers  
to be very good at this, or to even to ask them to spend time thinking  
about this. That's a job for service providers. Framing the issue in  
terms of provenance allows data providers to use a kind of information  
they are, in principle, experts about to structure their collections,  
which is presumably a good way to achieve consistency. Furthermore, it  
allows service providers to be reasonably confident that they are  
aggregating records of the same basic kind from different service  
providers. It is thus more consonant with the overall OAI model  
wherein data providers and service providers interact in terms of a  
well-defined series of agreements without the one having to pay  
attention to the internal activities of the other.

Jeff



More information about the Olac-implementers mailing list