Last call for review of new metadata documents
Gary Simons
Gary_Simons at SIL.ORG
Mon Mar 31 15:20:47 UTC 2008
Dear implementers,
Today being the stated last day of the review period, this is the last call
for comments for the documents on metadata usage guidelines and metrics.
(The original call with the URLs is appended.) That is not to say that we
won't accept comments after today. if you come through with comments in
the next few days, we'll gladly receive them. However, we'll start working
on the next phase of the process soon.
Thanks to Jeff and Helen for stimulating discussion on the granularity
issue. We have good feedback for revising the granularity section of the
usage guidelines. Jeff's proposal about replacing the signal-to-nois-ratio
statement with one centered on provenance is a good one. That impacts the
OLAC Process standard since that is where signal-tto-noise is stated as a
principle for judging new registration applications. A light revision of
our standards is in the wings as part of moving from 1.0 to 1.1 of the
metadata standard, so we can address that point then. I also like the
suggestion of using the FAQ to bring out these explanations of with is
implied by virtual library in the mission statement.
Jeff raises a question that I should answer, namely, "I'm actually somewhat
confused by the fact that you say OLAC is maintaining a catalog of
resources. It was my understanding that OLAC is right now only maintaining
one kind of "catalog", but not one of resources. Rather, it maintains a
list of participating archives. The full catalogs of resources (for
linguists, at least) are maintained by the two service providers: LINGUIST
and the LDC." While the LDC and Linguist search engines are the visible
faces of search (and still others are possible given the OAI-PMH model), an
important (but not so visible) service provided centrally by OLAC is the
OLAC Aggregator. OLAC runs an incremental harvest every 12 hours of all
registered repositories and offers a single aggregated catalog to the world
via the OAI-PMH at the following base URL (which actually generates a
useful documentation page if you visit it):
http://www.language-archives.org/cgi-bin/olaca3.pl
This is where, for instance, the mandatory OAI_DC metadata format of the
OAI-PMH is implemented. In static repositories, data providers give only
OLAC metadata, but OLAC plugs them into the wider OAI-PMH world by
providing the crosswalk to OAI_DC format as a value-added service in the
single OLACA repository. All of the new work with metrics and quality
checks is also based on the aggregated catalog. OLAC does not "maintain" an
original catalog in the same way that each data provider maintains its
catalog; but we are maintaining the aggreaged catalog of the virtual
library by harvesting everyday to keep it up to date and doing checks to
maintain quality. OLACA can also serve as the single point of contact for
anyone who wants to implement a service based on OLAC metadata--the
possible approaches are to idependently run the OLAC harvester (and create
one's own aggregated catalog) or to simply harvest from the pre-aggregated
OLACA data provider.
There is another question that begs for an answer:
Do we say, "OLAC is all about search!" (my simplification of something
Helen said) or do we say, "OLAC aims for digital linguistic utopia
starting with search!".
The latter statement is closer, if you substitute "descriptive metadata"
for "search". But I hasten to add that when that utopia is reached, we
won't call the result OLAC, just like we don't confuse the web with the
W3C. My plenary talk for last summer's workshop, "Toward the
interoperability of language resources," paints a picture of such a utopia
as an interoperating cyberinfrastructure for linguistics (and gives a
diagram on the closing slie):
http://linguistlist.org/tilr/papers/TILR%20Plenary%20Slides.pdf
Of the 12 elements in the infraostructure, elements 1 through 4
(Aggregator, Metadata standard, Submission protocol, and Harvesting
protocol) are specifically identified as being OLAC's contribution. Those
standards are what make it possible for the other 8 elements to be built in
such a way that they interoperate with each other at least at the common
denominator level defined by the metadata standard. The OLAC process also
provides a way for the community developing this infrastructure to define
additional standards that promote community interoperation, but the vision
also includes specialized subcommunities getting togetehr to define more
specific standards that are specific to their areas of focus.
I hope that helps clarify things.
Best,
-Gary
Gary Simons
<gary_simons at SIL.
ORG> To
Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST
Implementers List LIST.ORG
<OLAC-IMPLEMENTER cc
S at LISTSERV.LINGUI
STLIST.ORG> Subject
Call for review of new metadata
documents
03/05/2008 10:35
PM
Please respond to
Open Language
Archives
Community
Implementers List
<OLAC-IMPLEMENTER
S at LISTSERV.LINGUI
STLIST.ORG>
Dear implementers,
Many of you also subscribe to the OLAC-GENERAL list and so have gotten the
general announcement about this call for review for new metadata documents.
Those of you who have implemented an OLAC data provider are directly
affected since this new work focuses on ways of improving the quality of
the
metadata in our implementations. In this message we repeat the general
announcement for the benefit of those not subscribed to OLAC-GENERAL, and
then we supply further information that is relevant to you as implementers.
Six months ago the US National Science Foundation awarded funding for a
project named "OLAC: Accessing the World's Language Resources" which aims
to
greatly improve access to language resources for linguists and the broader
communities of interest. If you are interested in learning more about the
project, you may visit the project home page at:
http://olac.wiki.sourceforge.net/
In the first phase of the project we are focusing on improving metadata
quality as a prerequisite to improving the quality of search. To that end
we have drafted some new documents that can serve as a basis for improving
and measuring metadata quality within our community:
Best Practice Recommendations for Language Resource Description
http://www.language-archives.org/REC/bpr.html
OLAC Metadata Usage Guidelines
http://www.language-archives.org/NOTE/usage.html
OLAC Metadata Quality Metrics
http://www.language-archives.org/NOTE/metrics.html
These documents have been reviewed in Draft status by the Metadata Working
Group. After significant revision, they are now promoted to Proposed status
and are thus ready for review by the entire community. In keeping with the
OLAC Process standard, we hereby make a formal call for review. The review
period will end on MARCH 31, at which point all of the comments that have
been received will be processed to create revised versions of the
documents.
You may submit comments by simply replying to this message. <End of general
announcement>
The OLAC Metadata Standard that you followed in implementing your
repository
defines the constraints on validity for a metadata record, but it gives no
advice about what a high quality metadata record is like. The first two
documents listed above address this issue. Then, in keeping with the OLAC
core value of "Peer Review", we want to implement a service that will
measure conformance to the recommendations that can be automatically tested
for. That is the issue addressed by the third document listed above.
We have implemented the proposed Metadata Quality Score so that you can see
the implications for your current metadata. (As the documents are revised
to
express community consensus, the implementation of the metrics will be
updated to match.) The metadata quality analysis as currently implemented
is
accessible from a test version of the Participating Archives page. The site
has no links to this test page; it is accessed by entering this URL in a
browser:
http://www.language-archives.org/archives-new.php
Follow the "Sample Record" link for your archive to see the quality score
for the sample record named in your Identify response, along with comments
on what can be done to improve the score. Follow the "Metrics" link to see
the average quality score for the records you are currently providing.
Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de
Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data
Repository who are already getting scores around 8 or higher. The rest of
us have room for significant improvement!
Eventually, this new Participating Archives page will replace the one that
is currently accessed from the ARCHIVES link in the OLAC site banner.
However, this will not happen right away. After the current round of review
and any subsequent revisions, the documents will be put to the OLAC
Council,
who will check the revised documents and promote them to Candidate status
when they feel they are ready. Next we will issue a call for implementation
and give at least one month for implementer feedback. Based on that
feedback, final revisions will be made to the satisfaction of the Council
who will then grant Adopted status. The new Participating Archives page
will not replace the current one until the new guidelines and metrics are
adopted.
This discussion of process is to let you know that you will probably want
to
plan to update the implementation of your metadata repository some time
within the next few months. When these new metadata recommendations and
usage guidelines are officially adopted, the public will be able to see the
metrics scores for your repository. In the meantime, it is just other
implementers who are seeing them. You need not wait until the Candidate
call
for implementation to begin implementing changes. As soon as your updated
repository is harvested, you will see the metrics change.
Again, the review period will end on MARCH 31, at which point all of the
comments that have been received will be processed to create revised
versions of the documents. You may submit comments by replying to the list
(and potentially entering into discussion with other implementers) or by
mailing them to <olac_project at gial.edu>. That account is handled by Debbie
Chang, a Masters candidate at the Graduate Institute of Applied Linguistics
who is the Research Assistant for our project. She will compile a list of
all the comments (whether submitted to the list or to the project account),
which the document editors will then be asked to respond to. That response
will come after the review period closes.
With a solid foundation based on quality metadata, our grant project will
be
able to build improved search services and to expand coverage by attracting
more participating archives and by implementing gateways to other
aggregators. We are grateful for your participation in this venture and
trust that you share our excitement about its potential.
Best wishes,
Gary & Steven
_______
Steven Bird, University of Melbourne and University of Pennsylvania
Gary Simons, SIL International and GIAL
OLAC Coordinators (www.language-archives.org)
More information about the Olac-implementers
mailing list