From gary.holton at UAF.EDU Tue Jan 22 21:16:14 2002 From: gary.holton at UAF.EDU (Gary Holton) Date: Tue, 22 Jan 2002 16:16:14 -0500 Subject: OLAC Schema version 0.4 released Message-ID: Note that I have been unable to register OLAC 0.4 with OAI because of the type="institutional" attribute of the tag. It's unclear to me whether this is a problem with the schema or with the OAI verifier. Nevertheless, I registered without the type="institutional". Gary Holton From sb at UNAGI.CIS.UPENN.EDU Wed Jan 23 14:53:26 2002 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Wed, 23 Jan 2002 09:53:26 EST Subject: OLAC Protocol for Metadata Harvesting In-Reply-To: Your mail dated Thursday 27 December, 2001. Message-ID: Folks, Back in December I reported on some discussions Gary and I had concerning , the OLAC Archive Description element (see section 3 of http://www.language-archives.org/OLAC/protocol.html). This element contains archive-level metadata - the data that describes the archive as a whole. The issue concerned support for archive descriptions in multiple languages, and the proposed solution was to add a lang attribute to the olac-archive element. Multiple instances of the element would then be given, one per language, e.g.: ... National Archives of Canada ... .... Archives nationales du Canada ... However, this extra feature adds complexity to our software: - the databases must now keep this archive-level metadata in a separate table (to permit arbitrary numbers of versions) - there needs to be best practices about consistency of content across the different language versions - we need to find a way to distinguish official names from translations, as we already had to for alternative titles [http://www.language-archives.org/OLAC/olacms.html#Title] A simpler approach is to permit a single element, and for OLAC implementers to specify archive-level metadata in exactly the form they want it to be presented by service providers. For example: ... National Archives of Canada / Archives nationales du Canada ... This approach conforms with the approach taken elsewhere in the protocol document, where we have said that element content should be given in the form that it should presented by service providers. For example: > If more than one person has collaborated as personal sponsors of the > archive, then this element should contain all the names in the order and > format the collaborators want to be cited. We could say something similar for multiple languages: "If the name of the sponsoring institution is standardly given in more than one language, then this element should contain all the names in the order and format required, e.g. National Archives of Canada / Archives nationales du Canada" In this way, we are drawing a sharp distinction between item-level and archive-level metadata. At the item level, multiple creators, titles, languages etc are to be separated into distinct elements, e.g.: Na tala 'uria na idulaa diana The road to good reading Bloomfield, Leonard Haas, Mary Service providers will make heavy use of this structure, both in indexing materials, and in presenting them to end-users. At the archive level, multiple creators, titles, languages etc are collapsed into single elements (as we saw above), and service providers can simply use these pre-formatted text strings to present end-users with details of the harvested archives. We propose to add paragraph markup

for the elements (like synopsis) which permit free text content, so that implementers can separate content in different languages. I hope this makes sense. Any comments are welcomed. Thanks, -Steven -- Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608 From gary.holton at UAF.EDU Tue Jan 22 21:16:14 2002 From: gary.holton at UAF.EDU (Gary Holton) Date: Tue, 22 Jan 2002 16:16:14 -0500 Subject: OLAC Schema version 0.4 released Message-ID: Note that I have been unable to register OLAC 0.4 with OAI because of the type="institutional" attribute of the tag. It's unclear to me whether this is a problem with the schema or with the OAI verifier. Nevertheless, I registered without the type="institutional". Gary Holton From sb at UNAGI.CIS.UPENN.EDU Wed Jan 23 14:53:26 2002 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Wed, 23 Jan 2002 09:53:26 EST Subject: OLAC Protocol for Metadata Harvesting In-Reply-To: Your mail dated Thursday 27 December, 2001. Message-ID: Folks, Back in December I reported on some discussions Gary and I had concerning , the OLAC Archive Description element (see section 3 of http://www.language-archives.org/OLAC/protocol.html). This element contains archive-level metadata - the data that describes the archive as a whole. The issue concerned support for archive descriptions in multiple languages, and the proposed solution was to add a lang attribute to the olac-archive element. Multiple instances of the element would then be given, one per language, e.g.: ... National Archives of Canada ... .... Archives nationales du Canada ... However, this extra feature adds complexity to our software: - the databases must now keep this archive-level metadata in a separate table (to permit arbitrary numbers of versions) - there needs to be best practices about consistency of content across the different language versions - we need to find a way to distinguish official names from translations, as we already had to for alternative titles [http://www.language-archives.org/OLAC/olacms.html#Title] A simpler approach is to permit a single element, and for OLAC implementers to specify archive-level metadata in exactly the form they want it to be presented by service providers. For example: ... National Archives of Canada / Archives nationales du Canada ... This approach conforms with the approach taken elsewhere in the protocol document, where we have said that element content should be given in the form that it should presented by service providers. For example: > If more than one person has collaborated as personal sponsors of the > archive, then this element should contain all the names in the order and > format the collaborators want to be cited. We could say something similar for multiple languages: "If the name of the sponsoring institution is standardly given in more than one language, then this element should contain all the names in the order and format required, e.g. National Archives of Canada / Archives nationales du Canada" In this way, we are drawing a sharp distinction between item-level and archive-level metadata. At the item level, multiple creators, titles, languages etc are to be separated into distinct elements, e.g.: Na tala 'uria na idulaa diana The road to good reading Bloomfield, Leonard Haas, Mary Service providers will make heavy use of this structure, both in indexing materials, and in presenting them to end-users. At the archive level, multiple creators, titles, languages etc are collapsed into single elements (as we saw above), and service providers can simply use these pre-formatted text strings to present end-users with details of the harvested archives. We propose to add paragraph markup

for the elements (like synopsis) which permit free text content, so that implementers can separate content in different languages. I hope this makes sense. Any comments are welcomed. Thanks, -Steven -- Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608