From olac-admin at language-archives.org Mon Mar 29 23:03:05 2004 From: olac-admin at language-archives.org (Steven Bird) Date: Tue, 30 Mar 2004 09:03:05 +1000 Subject: News from the Open Language Archives Community (OLAC) Message-ID: Dear Community, Here is a summary of the developments in the Open Language Archives Community since our last general news posting in September. Full details are available at: http://www.language-archives.org/ OLAC METADATA STANDARD ADOPTED The OLAC Metadata standard has been promoted to `adopted' status by the OLAC Council following a 12 month period of experimentation by OLAC implementers. This document defines the format used for the interchange of metadata within the framework of the Open Archives Initiative. The metadata set is based on Qualified Dublin Core, but the format allows for the use of extensions to express community-specific qualifiers. OLAC Metadata (Adopted standard, 2003-12-08): http://www.language-archives.org/OLAC/metadata.html NEW LDC SERVICE PROVIDER The Linguistic Data Consortium at the University of Pennsylvania now hosts an OLAC search interface modelled on Google. Features include result summaries by archive, result ranking, approximate language name matching, and country-based searches. The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering and the Linguistic Data Consortium. LDC Service Provider: http://www.ldc.upenn.edu/olac/search.php OLAC IDENTIFIED AS "EXEMPLARY" IN DIGITAL LIBRARY FEDERATION REPORT In a recent Survey of Digital Library Aggregation Services, published by the Digital Library Federation, Martha Brogan praised the Open Language Archives Community as exemplary. She concluded her discussion with the following statement: OLAC is exemplary in several ways: the technical and social infrastructure that it has developed to support its community of contributors, based on shared principles and standards; the resources that it provides at its Web site about its purpose, scope, history, tools, news and events; and the efforts of its two leaders -- Gary Simons and Steven Bird [2003a, 2003b, 2003c] -- to articulate the challenges, analyze the options, and recommend possible solutions to their community of contributors in order to improve OLAC. With the formal appointment of an Outreach Working Group and its other efforts to accommodate small archives that lack technical support, OLAC's content and influence is likely to grow. A Survey of Digital Library Aggregation Services http://www.diglib.org/pubs/brogan/ Digital Library Federation http://www.diglib.org/ NEW OLAC PUBLICATION TO APPEAR IN 2004 Steven Bird and Gary Simons (2004), Building an Open Language Archives Community on the DC Foundation, to appear in Hillmann and Westbrooks (editors), Metadata in Practice: A Work in Progress, ALA Editions. http://www.language-archives.org/documents/mip.pdf Abstract: The Open Language Archives Community is an international partnership of institutions and individuals that is creating a worldwide virtual library of language resources. We report on the development of OLAC metadata as a specialization of Dublin Core metadata and then describe the interoperability framework in which the metadata is validated, disseminated and aggregated. We also discuss the community-centered process by which OLAC standards and practices are created and maintained. In each of these three areas, metadata, interoperability, and process, we show how OLAC began with a model that was too cumbersome to implement then found a new formulation which worked in practice. By reporting on this experience of metadata in practice, we hope to show how a specialist community can address its resource discovery needs by building on the Dublin Core foundation. PROGRESS WITH ISO LANGUAGE CODES The current international standard for language identification codes (ISO 639) covers only about 5% of known languages. OLAC's controlled vocabulary for identifying languages in resource metadata achieves complete coverage by augmenting ISO codes with codes for all living languages from SIL International's Ethnologue and for extinct and constructed languages from Linguist List. The present SIL and Linguist List codes are not compatible with existing ISO codes. However, work is in process to align the SIL and Linguist List codes with the ISO codes and to define a new Part 3 of ISO 639 that will be a superset of ISO 639-2 and cover all known languages (past and present). A committee draft was balloted by member bodies of ISO/TC37/SC 2 and approved in January 2004 for advancement with revisions to the stage of Draft International Standard. Final adoption would be more than a year away since at least two more rounds of balloting are required before that is possible. It is anticipated that OLAC's vocabulary for identifying languages would change to the new standard if it is adopted. OLAC's controlled vocabulary for identifying languages http://www.language-archives.org/REC/language.html OLAC ARCHIVE ON BOARD EUROPEAN SPACE AGENCY MISSION On March 2, the Rosetta Disk left Earth on board an Ariane-5 rocket from the European Spaceport in Kourou, French Guyana. The mission's target is the comet Churyumov-Gerasimenko, which will be reached in 2014 after a "billiard ball" journey through the Solar System lasting more than ten years. The Rosetta Disk is a modern version of the Rosetta Stone. The 2-inch nickel disk is micro-etched with 30,000 pages of information covering over 1,000 languages. For each language there is a simple dictionary, a guide to pronunciation and counting, and a traditional story with translation. Additionally, to help language decipherment in remote futures, a translation of a common text (the first three chapters of the book of Genesis) is provided in all languages. The disk can be read with the aid of an optical microscope. The materials on the disk come from the Rosetta 1000 Language Archive, an OLAC repository. Rosetta 1000 Langauge Archive: http://www.rosettaproject.org/live/search/languagesearch European Space Agency Rosetta Mission: http://www.esa.int/export/SPECIALS/Rosetta/ LanguageLog: Offsite backup for world's languages: http://itre.cis.upenn.edu/~myl/languagelog/archives/000499.html OLAC REPOSITORY EDITOR OLAC invites every researcher and archive to become an OLAC Data Provider by submitting information about available language-oriented resources. The information requested includes the language described or analyzed, the format of the resource (e.g., web page, hard copy, cassette), how it can be accessed, and so on. Note that the data itself is not requested, so you still have full control over who accesses it. The OLAC Repository Editor (ORE) is a service offered at the LINGUIST site, ideally suited to individual repositories. For example, one could use ORE to set up a repository called "John Smith's Warlpiri Resources," with records for field notes, recordings, grammatical sketches, lexicons, unpublished papers, and so forth. OLAC Repository Editor http://www.linguistlist.org/olac/ore/ ARCHIVE REPORT CARDS The archive report cards, recently added to the OLAC site, give summary statistics for each repository and an assessment of the quality of the repository's metadata. The report cards can be accessed by clicking the "REPORT CARD" links on the OLAC Archives page. The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering. OLAC Archives Page (see "REPORT CARD" links): http://www.language-archives.org/archives.php4 Report for full set of repositories: http://www.language-archives.org/tools/reports/archiveReportCard.php?archive=all Documentation on report cards: http://www.language-archives.org/tools/reports/ExplainReport.html OLAC PRESENTED AT OPEN ARCHIVES INITIATIVE WORKSHOP Martin Wynne (Oxford Text Archive) presented OLAC at the Third Workshop on the Open Archives Initiative, held in Geneva last month. Third Workshop on the Open Archives Initiative: http://info.web.cern.ch/info/OAIP/ Slides from Martin's talk: http://www.language-archives.org/events/talks/olac-oai3.pdf OLAC ARTICLE IN LANGUAGELOG A recent piece about OLAC in LanguageLog explains OLAC to a broad audience. It demonstrates the need for language archive search services by comparing OLAC and Google searches for Santa Cruz (a language of the Solomon Islands). LanguageLog: Searching for Santa Cruz http://itre.cis.upenn.edu/~myl/languagelog/archives/000647.html Best wishes, Steven & Gary _______ Steven Bird, University of Melbourne (sb at csse.unimelb.edu.au) Gary Simons, SIL International (gary_simons at sil.org) OLAC Coordinators (www.language-archives.org) From olac-admin at language-archives.org Mon Mar 29 23:03:05 2004 From: olac-admin at language-archives.org (Steven Bird) Date: Tue, 30 Mar 2004 09:03:05 +1000 Subject: News from the Open Language Archives Community (OLAC) Message-ID: Dear Community, Here is a summary of the developments in the Open Language Archives Community since our last general news posting in September. Full details are available at: http://www.language-archives.org/ OLAC METADATA STANDARD ADOPTED The OLAC Metadata standard has been promoted to `adopted' status by the OLAC Council following a 12 month period of experimentation by OLAC implementers. This document defines the format used for the interchange of metadata within the framework of the Open Archives Initiative. The metadata set is based on Qualified Dublin Core, but the format allows for the use of extensions to express community-specific qualifiers. OLAC Metadata (Adopted standard, 2003-12-08): http://www.language-archives.org/OLAC/metadata.html NEW LDC SERVICE PROVIDER The Linguistic Data Consortium at the University of Pennsylvania now hosts an OLAC search interface modelled on Google. Features include result summaries by archive, result ranking, approximate language name matching, and country-based searches. The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering and the Linguistic Data Consortium. LDC Service Provider: http://www.ldc.upenn.edu/olac/search.php OLAC IDENTIFIED AS "EXEMPLARY" IN DIGITAL LIBRARY FEDERATION REPORT In a recent Survey of Digital Library Aggregation Services, published by the Digital Library Federation, Martha Brogan praised the Open Language Archives Community as exemplary. She concluded her discussion with the following statement: OLAC is exemplary in several ways: the technical and social infrastructure that it has developed to support its community of contributors, based on shared principles and standards; the resources that it provides at its Web site about its purpose, scope, history, tools, news and events; and the efforts of its two leaders -- Gary Simons and Steven Bird [2003a, 2003b, 2003c] -- to articulate the challenges, analyze the options, and recommend possible solutions to their community of contributors in order to improve OLAC. With the formal appointment of an Outreach Working Group and its other efforts to accommodate small archives that lack technical support, OLAC's content and influence is likely to grow. A Survey of Digital Library Aggregation Services http://www.diglib.org/pubs/brogan/ Digital Library Federation http://www.diglib.org/ NEW OLAC PUBLICATION TO APPEAR IN 2004 Steven Bird and Gary Simons (2004), Building an Open Language Archives Community on the DC Foundation, to appear in Hillmann and Westbrooks (editors), Metadata in Practice: A Work in Progress, ALA Editions. http://www.language-archives.org/documents/mip.pdf Abstract: The Open Language Archives Community is an international partnership of institutions and individuals that is creating a worldwide virtual library of language resources. We report on the development of OLAC metadata as a specialization of Dublin Core metadata and then describe the interoperability framework in which the metadata is validated, disseminated and aggregated. We also discuss the community-centered process by which OLAC standards and practices are created and maintained. In each of these three areas, metadata, interoperability, and process, we show how OLAC began with a model that was too cumbersome to implement then found a new formulation which worked in practice. By reporting on this experience of metadata in practice, we hope to show how a specialist community can address its resource discovery needs by building on the Dublin Core foundation. PROGRESS WITH ISO LANGUAGE CODES The current international standard for language identification codes (ISO 639) covers only about 5% of known languages. OLAC's controlled vocabulary for identifying languages in resource metadata achieves complete coverage by augmenting ISO codes with codes for all living languages from SIL International's Ethnologue and for extinct and constructed languages from Linguist List. The present SIL and Linguist List codes are not compatible with existing ISO codes. However, work is in process to align the SIL and Linguist List codes with the ISO codes and to define a new Part 3 of ISO 639 that will be a superset of ISO 639-2 and cover all known languages (past and present). A committee draft was balloted by member bodies of ISO/TC37/SC 2 and approved in January 2004 for advancement with revisions to the stage of Draft International Standard. Final adoption would be more than a year away since at least two more rounds of balloting are required before that is possible. It is anticipated that OLAC's vocabulary for identifying languages would change to the new standard if it is adopted. OLAC's controlled vocabulary for identifying languages http://www.language-archives.org/REC/language.html OLAC ARCHIVE ON BOARD EUROPEAN SPACE AGENCY MISSION On March 2, the Rosetta Disk left Earth on board an Ariane-5 rocket from the European Spaceport in Kourou, French Guyana. The mission's target is the comet Churyumov-Gerasimenko, which will be reached in 2014 after a "billiard ball" journey through the Solar System lasting more than ten years. The Rosetta Disk is a modern version of the Rosetta Stone. The 2-inch nickel disk is micro-etched with 30,000 pages of information covering over 1,000 languages. For each language there is a simple dictionary, a guide to pronunciation and counting, and a traditional story with translation. Additionally, to help language decipherment in remote futures, a translation of a common text (the first three chapters of the book of Genesis) is provided in all languages. The disk can be read with the aid of an optical microscope. The materials on the disk come from the Rosetta 1000 Language Archive, an OLAC repository. Rosetta 1000 Langauge Archive: http://www.rosettaproject.org/live/search/languagesearch European Space Agency Rosetta Mission: http://www.esa.int/export/SPECIALS/Rosetta/ LanguageLog: Offsite backup for world's languages: http://itre.cis.upenn.edu/~myl/languagelog/archives/000499.html OLAC REPOSITORY EDITOR OLAC invites every researcher and archive to become an OLAC Data Provider by submitting information about available language-oriented resources. The information requested includes the language described or analyzed, the format of the resource (e.g., web page, hard copy, cassette), how it can be accessed, and so on. Note that the data itself is not requested, so you still have full control over who accesses it. The OLAC Repository Editor (ORE) is a service offered at the LINGUIST site, ideally suited to individual repositories. For example, one could use ORE to set up a repository called "John Smith's Warlpiri Resources," with records for field notes, recordings, grammatical sketches, lexicons, unpublished papers, and so forth. OLAC Repository Editor http://www.linguistlist.org/olac/ore/ ARCHIVE REPORT CARDS The archive report cards, recently added to the OLAC site, give summary statistics for each repository and an assessment of the quality of the repository's metadata. The report cards can be accessed by clicking the "REPORT CARD" links on the OLAC Archives page. The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering. OLAC Archives Page (see "REPORT CARD" links): http://www.language-archives.org/archives.php4 Report for full set of repositories: http://www.language-archives.org/tools/reports/archiveReportCard.php?archive=all Documentation on report cards: http://www.language-archives.org/tools/reports/ExplainReport.html OLAC PRESENTED AT OPEN ARCHIVES INITIATIVE WORKSHOP Martin Wynne (Oxford Text Archive) presented OLAC at the Third Workshop on the Open Archives Initiative, held in Geneva last month. Third Workshop on the Open Archives Initiative: http://info.web.cern.ch/info/OAIP/ Slides from Martin's talk: http://www.language-archives.org/events/talks/olac-oai3.pdf OLAC ARTICLE IN LANGUAGELOG A recent piece about OLAC in LanguageLog explains OLAC to a broad audience. It demonstrates the need for language archive search services by comparing OLAC and Google searches for Santa Cruz (a language of the Solomon Islands). LanguageLog: Searching for Santa Cruz http://itre.cis.upenn.edu/~myl/languagelog/archives/000647.html Best wishes, Steven & Gary _______ Steven Bird, University of Melbourne (sb at csse.unimelb.edu.au) Gary Simons, SIL International (gary_simons at sil.org) OLAC Coordinators (www.language-archives.org)