From Gary_Simons at SIL.ORG Sat Dec 8 03:56:04 2001 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 7 Dec 2001 21:56:04 -0600 Subject: Relating marked up resources to software Message-ID: On 09/19/2001, Michel Jacobson posted to OLAC Implementers re: "The metadata of the LACITO Archive are now coded in OLAC". I just found this in a pocket of unread email and want to comment on one point: >Remark: In formatting our metadata in OLAC, we felt the need for a >convention allowing us to specify the software tools which we provide for >processing certain resources. In particular, we provide specialized >stylesheets, etc., for processing our XML documents. A possible solution >would be to link the XML resource to a software resource via the >element. The current definition of this element would require us to code >the OAI identifier of the software resource as its content, e.g.: >oai:lacito:myPgm. However, this leaves open the >question of how the user is to call the software tool and apply it to a >given XML resource. A non elegant solution might be to create an OAI >identifier for each way the software in question can be called. Some >extension of the semantics of the Relation element or of the controlled >vocabulary of its attribute may be desirable for this purpose. The solution that the current draft of the metadata standard recommends (which was released a month after Michel's posting) is to use indirect linkage via the element. That is, rather than the text resource saying oai:lacito:myPgm, it would say oai:lacito:myDTD. Then the stylesheet would also be a deposited resource with the very same declaration, since it is a resource that is related to the very same DTD. The trick then is that our community's centralized catalog service provider should set relations in the relational database it harvests into between resources and the markup scheme they use. That service provider would then be able to augment its report on the DTD itself by listing all the cataloged resources by type (e.g. text vs. stylesheet) that are related to the DTD. This would get us from any text that uses a particular DTD to any stylesheet that has been designed for it (of which there could be dozens contributed by different developers in different archives), even though the archive depositing the text doesn't know about the other stylesheets. Hope this makes sense, -Gary Simons From Gary_Simons at SIL.ORG Sat Dec 15 16:43:08 2001 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Sat, 15 Dec 2001 11:43:08 EST Subject: OLAC Protocol for Metadata Harvesting Message-ID: Dear OLAC-Implementers, Those of you who were at the workshop last December where OLAC was founded will recall that the group proposed an alpha testing period during which we would develop the basic standards, followed by an official launch, followed by a one year freeze of the standards which would serve as a period of beta testing and more widespread adoption. At the end of the freeze, the standards would be revised based on feedback from implementers, and released as version 1.0. Our first launch event, a symposium at the annual meeting of the Linguistics Society of America, is now three weeks away. There are three standards documents that define OLAC and how it works: Process: http://www.language-archives.org/OLAC/process.html Metadata set: http://www.language-archives.org/OLAC/olacms.html Harvesting protocol: http://www.language-archives.org/OLAC/protocol.html The first two should be familiar as they have been circulated on this list before. The third is new; it describes the extensions that OLAC makes to the OAI protocol for metadata harvesting. Currently all three documents have "Proposed" status. According to our process document, the next status is "Candidate" during which a standard undergoes a period of testing before final adoption. Our plan is to advance these documents to Candidate status before the launch in January. Another feature of our process is that you, the implementers, have the major stake in setting the standards. We thus encourage you to review these proposed standards one more time and give any feedback you have concerning revisions you think should be incorporated into the versions that will be shortly frozen for a year. The new document on the harvesting protocol we particularly encourage you to look at. One part will require each of you implementers to modify your data provider. This is the proposed element used for archive description in the Identify response. You may recall that the OAI protocol has a repeatable element in the Identify response that is designed for subcommunities to customize. This new document proposes our community's customization. OLAC is one year old this week. We thank you for your support, and for helping get the initiative off to a great start. Best wishes, Gary & Steven ________ Steven Bird, University of Pennsylvania (sb at ldc.upenn.edu) Gary Simons, SIL International (gary_simons at sil.org) OLAC Coordinators (www.language-archives.org) From sb at UNAGI.CIS.UPENN.EDU Thu Dec 27 15:25:38 2001 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Thu, 27 Dec 2001 10:25:38 EST Subject: OLAC Protocol for Metadata Harvesting Message-ID: Folks, Recently, Gary and I had some discussion on supporting multiple languages in the archive description (i.e. collection-level metadata), as defined in the OLAC-PMH: I wrote: > I think we need to permit a lang attribute for the text-valued elements. > The other option - specifying that English will be used - is likely to > be unacceptable. This then raises the possibility that archives might > want to provide these text-valued elements in multiple languages, which > starts to sound painful. Do you have any thoughts on this? Gary wrote: > This is a good point. After pondering it a bit, I think the way to do it > would not be field by field, but for the whole record. is > already multiply occurring, so if we just add a lang attribute to > , then people could generate as many archive descriptions as > they want in as many languages as they want. That would make the > olac-archive correspond to a table in a service provider's database that > would be in a one-to-many relationship with the archives table, which would > be a whole lot easier than handling one-to-many at the field level. I agree that the lang attribute should be specified at the level of the record. Here is a mock-up for the National Archives of Canada / Archives nationales du Canada. Lines that differ in the two versions are prefaced with an asterisk.

* http://www.archives.ca/02/0201_e.html * Mr. Ian E. Wilson * National Archivist of Canada * National Archives of Canada http://www.archives.ca/ * 395 Wellington Street, Ottawa, Ontario K1A 0N3, CANADA

* http://www.archives.ca/02/0201_f.html * M. Ian E. Wilson * Archiviste national du Canada * Archives nationales du Canada http://www.archives.ca/ * 395, rue Wellington, OTTAWA (Ontario) K1A 0N3, CANADA

It would be a best practice for each version of the record to provide semantically equivalent information. Please let us know if anyone sees a problem with this simple approach to supporting multiple languages for collection-level metadata. Please post any responses directly to the list (simply by replying to the email). Steven Bird -- Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608 From Gary_Simons at SIL.ORG Sat Dec 8 03:56:04 2001 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 7 Dec 2001 21:56:04 -0600 Subject: Relating marked up resources to software Message-ID: On 09/19/2001, Michel Jacobson posted to OLAC Implementers re: "The metadata of the LACITO Archive are now coded in OLAC". I just found this in a pocket of unread email and want to comment on one point: >Remark: In formatting our metadata in OLAC, we felt the need for a >convention allowing us to specify the software tools which we provide for >processing certain resources. In particular, we provide specialized >stylesheets, etc., for processing our XML documents. A possible solution >would be to link the XML resource to a software resource via the >element. The current definition of this element would require us to code >the OAI identifier of the software resource as its content, e.g.: >oai:lacito:myPgm. However, this leaves open the >question of how the user is to call the software tool and apply it to a >given XML resource. A non elegant solution might be to create an OAI >identifier for each way the software in question can be called. Some >extension of the semantics of the Relation element or of the controlled >vocabulary of its attribute may be desirable for this purpose. The solution that the current draft of the metadata standard recommends (which was released a month after Michel's posting) is to use indirect linkage via the element. That is, rather than the text resource saying oai:lacito:myPgm, it would say oai:lacito:myDTD. Then the stylesheet would also be a deposited resource with the very same declaration, since it is a resource that is related to the very same DTD. The trick then is that our community's centralized catalog service provider should set relations in the relational database it harvests into between resources and the markup scheme they use. That service provider would then be able to augment its report on the DTD itself by listing all the cataloged resources by type (e.g. text vs. stylesheet) that are related to the DTD. This would get us from any text that uses a particular DTD to any stylesheet that has been designed for it (of which there could be dozens contributed by different developers in different archives), even though the archive depositing the text doesn't know about the other stylesheets. Hope this makes sense, -Gary Simons From Gary_Simons at SIL.ORG Sat Dec 15 16:43:08 2001 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Sat, 15 Dec 2001 11:43:08 EST Subject: OLAC Protocol for Metadata Harvesting Message-ID: Dear OLAC-Implementers, Those of you who were at the workshop last December where OLAC was founded will recall that the group proposed an alpha testing period during which we would develop the basic standards, followed by an official launch, followed by a one year freeze of the standards which would serve as a period of beta testing and more widespread adoption. At the end of the freeze, the standards would be revised based on feedback from implementers, and released as version 1.0. Our first launch event, a symposium at the annual meeting of the Linguistics Society of America, is now three weeks away. There are three standards documents that define OLAC and how it works: Process: http://www.language-archives.org/OLAC/process.html Metadata set: http://www.language-archives.org/OLAC/olacms.html Harvesting protocol: http://www.language-archives.org/OLAC/protocol.html The first two should be familiar as they have been circulated on this list before. The third is new; it describes the extensions that OLAC makes to the OAI protocol for metadata harvesting. Currently all three documents have "Proposed" status. According to our process document, the next status is "Candidate" during which a standard undergoes a period of testing before final adoption. Our plan is to advance these documents to Candidate status before the launch in January. Another feature of our process is that you, the implementers, have the major stake in setting the standards. We thus encourage you to review these proposed standards one more time and give any feedback you have concerning revisions you think should be incorporated into the versions that will be shortly frozen for a year. The new document on the harvesting protocol we particularly encourage you to look at. One part will require each of you implementers to modify your data provider. This is the proposed element used for archive description in the Identify response. You may recall that the OAI protocol has a repeatable element in the Identify response that is designed for subcommunities to customize. This new document proposes our community's customization. OLAC is one year old this week. We thank you for your support, and for helping get the initiative off to a great start. Best wishes, Gary & Steven ________ Steven Bird, University of Pennsylvania (sb at ldc.upenn.edu) Gary Simons, SIL International (gary_simons at sil.org) OLAC Coordinators (www.language-archives.org) From sb at UNAGI.CIS.UPENN.EDU Thu Dec 27 15:25:38 2001 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Thu, 27 Dec 2001 10:25:38 EST Subject: OLAC Protocol for Metadata Harvesting Message-ID: Folks, Recently, Gary and I had some discussion on supporting multiple languages in the archive description (i.e. collection-level metadata), as defined in the OLAC-PMH: I wrote: > I think we need to permit a lang attribute for the text-valued elements. > The other option - specifying that English will be used - is likely to > be unacceptable. This then raises the possibility that archives might > want to provide these text-valued elements in multiple languages, which > starts to sound painful. Do you have any thoughts on this? Gary wrote: > This is a good point. After pondering it a bit, I think the way to do it > would not be field by field, but for the whole record. is > already multiply occurring, so if we just add a lang attribute to > , then people could generate as many archive descriptions as > they want in as many languages as they want. That would make the > olac-archive correspond to a table in a service provider's database that > would be in a one-to-many relationship with the archives table, which would > be a whole lot easier than handling one-to-many at the field level. I agree that the lang attribute should be specified at the level of the record. Here is a mock-up for the National Archives of Canada / Archives nationales du Canada. Lines that differ in the two versions are prefaced with an asterisk.