Reminder: Call for review of new metadata documents

Helen Aristar-Dry hdry at LINGUISTLIST.ORG
Sun Mar 30 14:20:53 UTC 2008


Extremely sensible remarks, Jeff.  I agree especially with the points 
about 'signal to noise' ratio and think that Gary's remarks on 
provenance or your revision, which gives an example, would be much more 
helpful.

-Helen

Jeff Good wrote:
> Dear Gary (and others),
> 
> Thanks a lot for that clarification. The relationship between the 
> mission statement and the granularity guidelines is much clearer to me 
> now. I agree with Helen that the reasoning should be made explicit 
> somewhere more prominent than this list. Your interpretation is not how 
> I have interpreted the mission statement largely because I missed out on 
> the significance of the word "library". I also find "virtual library" to 
> be ambiguous between what one might call a "digital library" and what 
> one might call an "aggregated library" (the latter sense being my label 
> for your understanding of the OLAC use).
> 
> I think it might be worth adding two questions to the FAQ (or answering 
> these questions in some other appropriate place): (i) What does OLAC 
> mean by "virtual library"? and (ii) What does OLAC mean by "language 
> archive"? That should help a lot with possible ambiguities in the 
> mission statement.
> 
> An important open issue, which is still not clear to me from your 
> explanation is whether OLAC is focusing on a "card catalog" now because 
> that's all OLAC ever sees itself doing or if it, instead, views getting 
> the card catalog part right as the first step towards a deeper kind of 
> interoperability. (My reading of the mission would be that the latter 
> interpretation is correct, but I already missed out on the importance of 
> "library" in the mission. So, I'm probably missing several other points. 
> I think the crucial point in this regard is understanding what the level 
> of interoperability one hopes to achieve with respect to the  
> "interoperating repositories".) I don't think this is a merely pedantic 
> issue right now because it matters a lot for how we "advertise" OLAC. Do 
> we say, "OLAC is all about search!" (my simplification of something 
> Helen said) or do we say, "OLAC aims for digital linguistic utopia 
> starting with search!". (For what it's worth, I don't really care 
> strongly about which path OLAC takes, but I would like to be confident 
> I'm describing OLAC's goals correctly to other people.)
> 
> 
>> aggregated catalogs, and (3) maintaining a catalog to help users find
>> resources.  The OLAC metadata standard is, of course, the 
>> specification for
>> how to create an entry for the catalog.
> 
> I'm actually somewhat confused by the fact that you say OLAC is 
> maintaining a catalog of resources. It was my understanding that OLAC is 
> right now only maintaining one kind of "catalog", but not one of 
> resources. Rather, it maintains a list of participating archives. The 
> full catalogs of resources (for linguists, at least) are maintained by 
> the two service providers: LINGUIST and the LDC. (I know there are lots 
> of connections between OLAC and these catalogs, but, strictly speaking, 
> I didn't think OLAC was in the catalog maintenance business but, rather, 
> defined a way through which a catalog could be maintained by outside 
> parties.)
> 
> 
>> book, rather than an individual recorded session.  Thus I think this
>> interpretation of desired granularity is straightforwardly implied by the
>> OLAC mission of creating a virtual library.  If you have any ideas of
>> specific wording changes in the granularity guidelines that might help to
>> clarify this, I'll be glad to hear them.
> 
> I think your response already has all of the required points. Helen 
> seemed to suggest adding an explanation to the mission statement. I'll 
> let you and Steven decide if that's appropriate. (I'm not sure what the 
> process is for adding explanations to the mission statement.)
> 
> With respect to the guidelines, I recommend changing the first paragraph 
> of the granularity discussion to something like the following (based on 
> my understanding of your explanation):
> 
> "Determining the right level for units to be described as language 
> resources in the OLAC context involves multiple factors. The level of 
> unit appropriate for inclusion in an aggregated catalog like OLAC's may 
> be different (typically higher) than the level desirable for the catalog 
> of a specific institution's holdings, which in turn is typically higher 
> than the level desirable for describing the detailed contents of a 
> resource. Consistent with its mission to create a virtual _library_ of 
> language resources, a basic rule of thumb for making determinations 
> regarding what kinds of units to treat as language resources should be 
> that they should be comparable to the kinds of units treated as 
> resources in a traditional library catalog. For example, libraries 
> typically assign a single record to each book, not to each chapter 
> within a book. A parallel example in the OLAC context would be treating 
> all the objects associated with a particular field trip as a single unit 
> rather than treating each of the individual resources created during 
> that field trip as separate units. The following discussion is aimed at 
> assisting an OLAC participant to find the right level of description."
> 
> It might the be nice to give lots of concrete examples, maybe you could 
> get some of the participating archives to do this?
> 
> One thing I deleted from that paragraph was reference to the 
> recommendation given in the Repository guidelines:
> "A metadata repository should not degrade the 'signal-to-noise ratio' 
> for language resource discovery."
> 
> I don't find this recommendation very helpful because (for me, at least) 
> it is too dependent on what kinds of resources I want to discover. In 
> other words, "language resource discovery" is too broad an activity for 
> there to be one "signal-to-noise ratio". For example, if I already know 
> I'm looking for resources on Nahuatl, I would probably not want to find 
> a record saying, "There's a bunch of material on Nahuatl that's part of 
> some bundle over at AILLA." The signal would be too weak for me--what 
> I'd prefer is the search result I'd get from AILLA's catalog. Of course, 
> for the next person, lots of detailed records about Nahuatl would 
> constitute "noise". Signal and noise just don't strike me as constant 
> enough to form the basis of a recommendation.
> 
> I also don't like that this recommendation privileges language resource 
> _discovery_ over other possible uses of the catalog. For example, 
> library catalogs have at least one other function in addition to 
> discovery: retrieval. Often I know a resource exists, but I don't know 
> where it is, which is why I consult the catalog (this is my primary use 
> of WorldCat, for example). (The word "discovery" is potentially 
> ambiguous enough to cover "find something previously unknown" and 
> "retrieve", but that's not my initial reading.) So, I would prefer a 
> recommendation that was more agnostic regarding the use of the metadata.
> 
> I personally find your new discussion of provenance in the metadata 
> usage guidelines much more helpful than 'signal-to-noise ratio', since 
> it's not dependent on particular uses of OLAC service providers. So, I'd 
> actually recommend the following revision to the repository guidelines 
> regarding granularity from the present recommendation to something like:
> 
> "A metadata repository should treat resources with a single provenance 
> as constituting a single unit with respect to OLAC metadata and should, 
> therefore, describe them within a single record."
> 
> Another advantage to talking about granularity in terms of provenance in 
> my view is that the current guidelines seem to be asking data providers 
> to hypothesize about what search scenarios their data will be put to, 
> but I don't think it's reasonable to expect data providers to be very 
> good at this, or to even to ask them to spend time thinking about this. 
> That's a job for service providers. Framing the issue in terms of 
> provenance allows data providers to use a kind of information they are, 
> in principle, experts about to structure their collections, which is 
> presumably a good way to achieve consistency. Furthermore, it allows 
> service providers to be reasonably confident that they are aggregating 
> records of the same basic kind from different service providers. It is 
> thus more consonant with the overall OAI model wherein data providers 
> and service providers interact in terms of a well-defined series of 
> agreements without the one having to pay attention to the internal 
> activities of the other.
> 
> Jeff

-- 
Helen Aristar-Dry
Professor of Linguistics
Director, Institute for Language Information and Technology (ILIT)
Eastern Michigan University
2000 Huron River Rd., Suite 104
Ypsilanti, MI 48197

734.487.0144 (ILIT office)
734.487.7952 (faculty office)
734.482.0132 (fax)
hdry at linguistlist.org



More information about the Olac-implementers mailing list