News from the Open Language Archives Community (OLAC)

Steven Bird olac-admin at
Mon Mar 29 23:03:05 UTC 2004

Dear Community,

Here is a summary of the developments in the Open Language
Archives Community since our last general news posting in September.
Full details are available at:


The OLAC Metadata standard has been promoted to `adopted' status by
the OLAC Council following a 12 month period of experimentation by
OLAC implementers.  This document defines the format used for the
interchange of metadata within the framework of the Open Archives
Initiative. The metadata set is based on Qualified Dublin Core, but
the format allows for the use of extensions to express
community-specific qualifiers.

OLAC Metadata (Adopted standard, 2003-12-08):


The Linguistic Data Consortium at the University of Pennsylvania now
hosts an OLAC search interface modelled on Google.  Features include
result summaries by archive, result ranking, approximate language name
matching, and country-based searches.  The service was developed by
Amol Kamat, Baden Hughes, and Steven Bird at the University of
Melbourne, with sponsorship from the Department of Computer Science
and Software Engineering and the Linguistic Data Consortium.

LDC Service Provider:


In a recent Survey of Digital Library Aggregation Services, published
by the Digital Library Federation, Martha Brogan praised the Open
Language Archives Community as exemplary. She concluded her discussion
with the following statement:

  OLAC is exemplary in several ways: the technical and social
  infrastructure that it has developed to support its community of
  contributors, based on shared principles and standards; the
  resources that it provides at its Web site about its purpose, scope,
  history, tools, news and events; and the efforts of its two leaders
  -- Gary Simons and Steven Bird [2003a, 2003b, 2003c] -- to
  articulate the challenges, analyze the options, and recommend
  possible solutions to their community of contributors in order to
  improve OLAC. With the formal appointment of an Outreach Working
  Group and its other efforts to accommodate small archives that lack
  technical support, OLAC's content and influence is likely to grow.

A Survey of Digital Library Aggregation Services
Digital Library Federation


Steven Bird and Gary Simons (2004), Building an Open Language Archives
    Community on the DC Foundation, to appear in Hillmann and
    Westbrooks (editors), Metadata in Practice: A Work in Progress,
    ALA Editions.

Abstract: The Open Language Archives Community is an international
    partnership of institutions and individuals that is creating a
    worldwide virtual library of language resources.  We report on the
    development of OLAC metadata as a specialization of Dublin Core
    metadata and then describe the interoperability framework in which
    the metadata is validated, disseminated and aggregated.  We also
    discuss the community-centered process by which OLAC standards and
    practices are created and maintained.  In each of these three
    areas, metadata, interoperability, and process, we show how OLAC
    began with a model that was too cumbersome to implement then found
    a new formulation which worked in practice.  By reporting on this
    experience of metadata in practice, we hope to show how a
    specialist community can address its resource discovery needs by
    building on the Dublin Core foundation.


The current international standard for language identification codes
(ISO 639) covers only about 5% of known languages.  OLAC's controlled
vocabulary for identifying languages in resource metadata achieves
coverage by augmenting ISO codes with codes for all living languages
from SIL International's Ethnologue and for extinct and constructed
languages from Linguist List.  The present SIL and Linguist List codes
are not compatible with existing ISO codes.  However, work is in
process to align the SIL and Linguist List codes with the ISO codes
and to define a new Part 3 of ISO 639 that will be a superset of ISO
639-2 and cover all known languages (past and present).  A committee
draft was balloted by member bodies of ISO/TC37/SC 2 and approved in
January 2004 for advancement with revisions to the stage of Draft
International Standard. Final adoption would be more than a year away
since at least two more rounds of balloting are required before that
is possible.  It is anticipated that OLAC's vocabulary for identifying
languages would change to the new standard if it is adopted.

OLAC's controlled vocabulary for identifying languages


On March 2, the Rosetta Disk left Earth on board an Ariane-5 rocket
from the European Spaceport in Kourou, French Guyana.  The mission's
target is the comet Churyumov-Gerasimenko, which will be reached in
2014 after a "billiard ball" journey through the Solar System lasting
more than ten years.  The Rosetta Disk is a modern version of the
Rosetta Stone. The 2-inch nickel disk is micro-etched with 30,000
pages of information covering over 1,000 languages.  For each language
there is a simple dictionary, a guide to pronunciation and counting,
and a traditional story with translation.  Additionally, to help
language decipherment in remote futures, a translation of a common
text (the first three chapters of the book of Genesis) is provided in
all languages.  The disk can be read with the aid of an optical
microscope.  The materials on the disk come from the Rosetta 1000
Language Archive, an OLAC repository.

Rosetta 1000 Langauge Archive:
European Space Agency Rosetta Mission:
LanguageLog: Offsite backup for world's languages:


OLAC invites every researcher and archive to become an OLAC Data
Provider by submitting information about available language-oriented
resources. The information requested includes the language described
or analyzed, the format of the resource (e.g., web page, hard copy,
cassette), how it can be accessed, and so on.  Note that the data
itself is not requested, so you still have full control over who
accesses it.  The OLAC Repository Editor (ORE) is a service offered at
the LINGUIST site, ideally suited to individual repositories.  For
example, one could use ORE to set up a repository called "John Smith's
Warlpiri Resources," with records for field notes, recordings,
grammatical sketches, lexicons, unpublished papers, and so forth.

OLAC Repository Editor


The archive report cards, recently added to the OLAC site, give
summary statistics for each repository and an assessment of the
quality of the repository's metadata.  The report cards can be
accessed by clicking the "REPORT CARD" links on the OLAC Archives
page.  The service was developed by Amol Kamat, Baden Hughes, and
Steven Bird at the University of Melbourne, with sponsorship from the
Department of Computer Science and Software Engineering.

OLAC Archives Page (see "REPORT CARD" links):
Report for full set of repositories:
Documentation on report cards:


Martin Wynne (Oxford Text Archive) presented OLAC at the Third
Workshop on the Open Archives Initiative, held in Geneva last month.

Third Workshop on the Open Archives Initiative:
Slides from Martin's talk:


A recent piece about OLAC in LanguageLog explains OLAC to a broad
audience.  It demonstrates the need for language archive search services
by comparing OLAC and Google searches for Santa Cruz (a language of the
Solomon Islands).

LanguageLog: Searching for Santa Cruz

Best wishes,
Steven & Gary
Steven Bird, University of Melbourne (sb at
Gary Simons, SIL International (gary_simons at
OLAC Coordinators (

More information about the Olac-general mailing list