News from the Open Language Archives Community (OLAC)

Steven Bird sb at
Fri Jun 13 02:51:09 UTC 2003

Dear Community,

There have been many significant developments in the Open Language
Archives Community since our last general news posting in July 2002.
Here is a summary; full details are available on the OLAC website at


Last December the IRCS Workshop on Open Language Archives was held at
the University of Pennsylvania.  This meeting revised core OLAC
standards and established a new metadata format, version 1.0.  Full
details, including an online proceedings, are available.

  Workshop website:


Here is the full list of current archives:

A Digital Archive of Research Papers in Computational Linguistics,
  Philadelphia, USA
Academia Sinica Formosan Language Archive,
  Taipei, Taiwan
Australian Studies Electronic Data Archive,
  Canberra, Australia
Archive of the Indigenous Languages of Latin America, UT Austin,
  Austin, USA
ATILF Resources,
  Nancy, France
Cornell Language Acquisition Laboratory,
  Ithaca, New York
Ethnologue: Languages of the World, SIL International,
  Dallas, USA
European Language Resources Association,
  Paris, France
Rosetta Project 1000 Language Archive, Long Now Foundation,
  San Francisco, USA
Surrey Morphology Group Databases, University of Surrey,
  Guildford, UK
Survey for California and Other Indian Languages, UC Berkeley,
  San Francisco, USA
TalkBank, Carnegie Mellon University,
  Pittsburgh, USA
The Linguistic Data Consortium Corpus Catalog,
  Philadelphia, USA
The Natural Language Software Registry,
  Saarbrucken, Germany
The Typological Database Project,
  Utrecht, Netherlands
Tibetan and Himalayan Digital Library, University of Virginia,
  Charlottesville, USA
TRACTOR Archive,
  Oxford, UK
Flint Archive, University of Queensland,
  Brisbane, Australia

(Note that some formerly-registered archives are no longer on the list
as they do not conform to the OLAC 1.0 standard.  Adminstrators of
those archives should review the OLAC Metadata and OLAC Repositories
documents, update their repositories, and re-register with OLAC.)


Over the last year OLAC has featured in articles in Scientific
American, Wired News, and the BBC World Service.  Please see the news
section of the website for pointers.

  OLAC News page:


We invite comment from the wider language resources community on
OLAC's proposed standards and recommendations.  The standards concern
the operation of OLAC's core infrastructure (protocols and processes)
and are mostly of concern to digital archivists.  The standards are
discussed on the OLAC-Implementers mailing list.  The recommendations,
on the other hand, concern best practices in language resource
description, and are mostly of concern to institutions and individuals
who create and use language resources.  The recommendations are
discussed on the METADATA mailing list.

  Proposed standards:
    OLAC Metadata (2002-12-11):
    OLAC Process (2002-12-10):
    OLAC Repositories (2003-05-28):
    OLAC-Implementers mailing list:

  Proposed recommendations:
    Recommended metadata vocabularies for Discourse Types, Language
    Identification, Linguistic Field, Linguistic Data Types and
    Participant Roles:
    METADATA mailing list:


Changes in OLAC standards, and also in underlying standards from the
Open Archives Initiative and the Dublin Core Metadata Initiative, have
required far-reaching changes in OLAC infrastructure.  Over the last
six months we have re-implemented all of the software infrastructure
on the OLAC website.  This work has been supported by the NSF EMELD
and Talkbank projects.

As a consequence, it is now easier than ever to set up an
institutional or individual metadata repository (i.e. resource
catalog) and register it with OLAC.  The simplest method is to create
an XML file describing language resources, post it on a website, and
register it with OLAC.  Such catalogs are checked twice daily by the
OLAC harvester, and any changes are incorporated into the central
resource catalog maintained on the OLAC site.  This is then made
available to other services including the LINGUIST site.

  EMELD Project: Electronic Metastructure for Endangered Languages Data
  Talkbank Project
  OLAC Service Provider at LINGUIST


The following research publications concerning OLAC will appear in
2003.  All are available from the documents section of the OLAC website.

  The Open Language Archives Community: An infrastructure for
    distributed archiving of language resources, Literary and Linguistic
    Computing 18(1), Special Issue on New Directions in Humanities
    Computing, 2003.

  Building an Open Language Archives Community on the OAI foundation,
    Library Hi Tech 21(2), Special Issue on the Open Archives
    Initiative, 2003.

  Extending Dublin Core Metadata to support the description and
    discovery of language resources, to appear in Computing and the
    Humanities 37, 2003.

  Seven dimensions of portability for language documentation and
    description, to appear in Language 79, 2003.

  OLAC Documents page:


The OLAC Working Group on Outreach will raise awareness of the
activities and resources of OLAC by facilating the production of
general-audience documents describing various aspects of OLAC and by
contacting individuals and organizations who manage archives but are
not yet part of OLAC.

The group has the following working draft: A Gentle Introduction to
Metadata (Jeff Good)

The group is conducting its work on the OLAC-OUTREACH mailing list
which is hosted on the LINGUIST site. To learn more and to join the
group, please see the Outreach Working Group page.

  OLAC Outreach Working Group:

Best wishes,
Steven & Gary
Steven Bird, U Melbourne and U Pennsylvania (sb at
Gary Simons, SIL International (gary_simons at
OLAC Coordinators (

More information about the Olac-general mailing list