From gary_simons at SIL.ORG Thu Mar 6 04:35:51 2008 From: gary_simons at SIL.ORG (Gary Simons) Date: Wed, 5 Mar 2008 23:35:51 -0500 Subject: Call for review of new metadata documents Message-ID: Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org) From jcgood at BUFFALO.EDU Thu Mar 6 15:44:40 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Thu, 6 Mar 2008 10:44:40 -0500 Subject: Call for review of new metadata documents In-Reply-To: Message-ID: Dear OLAC-Implementers, First, let me thank Gary and Steven for pulling together all these comments and making the usage guidelines revisions. It's great to see these things moving forward. It will take me some time to put together all of my comments on the revised guidelines, but I have one technical question now that I'm hoping those better informed about Dublin Core can answer. It's agreed that the isTranscriptOf and hasTranscriptOf relations are needed, but the conclusion is that we can't do anything about this in revision 1.1, but this has to be held off for revision 2. What I don't understand is why we can't use the existing model of OLAC controlled vocabulary refinements for this in the meantime. For example, why can't we use something like the second element, which looks to me to be mostly parallel to the first element, which comes out of the guidelines, using the prescribed method for encoding subject language: some-unique-identifier I guess the problem here is that "olac:code" in the second case would not be encoding a "thing" but a "relation". But is that so bad? Could we just call this "olac:predicates" to deal with this? The OLAC->DC mapping would still be straightforward (just strip out the "olac:" attributes), right? If this isn't possible for some technical reason, however, perhaps we can get a jumpstart on what will be an important feature of OLAC 2.0, by having someone draft the relevant document that will be needed to describe these refinements at some point anyway? I also wonder if, as a stopgap measure, why there can't be a recommendation about how to encode this in the meantime even if it is not officially part of the standard. For example, can't we informally agree to at least do something like this: IsTranscriptOf: some-unique-identifier At the very least, this should help people prepare for the fact that in OLAC 2.0, there will be an official way to code this. Jeff From Gary_Simons at SIL.ORG Tue Mar 25 02:16:52 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 24 Mar 2008 21:16:52 -0500 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear implementers, This is a reminder that we have one week left in the review period for the documents listed in the attached message. We are anxiously awaiting your feedback! So far we have gotten just one comment, namely, from Jeff Good asking about the possibility of using a solution like the following for isTranscriptOf and hasTranscript: Such a solution would be possible, but since isTranscriptOf is analogous to isVersionOf (and the other refinements of dc:relation), it really should be a new element (in the olac namespace) that is defined as a refinement of dc:relation, which would also enable it to take the encoding schemes that dc:relations take, e.g. This "proper" solution takes us beyond conformance to the current XML schema for qualified Dublin Core, so our thinking is that we don't want to implement a change like that, but rather wait for the revision of the XML schema for qualified DC (due out this year) that will support such extensions. We are also not keen to go to all the work of defining and implementing the olac:lingrelations extension (which includes writing a document and putting it through the stages of the review process) for a short-lived temporary solution. Thus, we have these new refinements on the list of changes for version 2.0 of our metadata format. -Gary Gary Simons To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Call for review of new metadata documents 03/05/2008 10:35 PM Please respond to Open Language Archives Community Implementers List Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org) From jcgood at BUFFALO.EDU Tue Mar 25 23:03:25 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Tue, 25 Mar 2008 19:03:25 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear Gary, Thanks for the clarification regarding the Relation element. It's too bad we're stuck waiting for DC to finish its process. Would it make sense for us to start the document process for this refinement before they officially release the new schema for qualified Dublin Core? Then we could take advantage of it quickly once it's official. I have one other question about the new documents at this point, again regarding granularity. The new discussion I think is quite welcome and sufficiently detailed and clear to be put to use. I still find that a bit of context is missing though. The background assumption seems to be that OLAC metadata is intended for certain kinds of search (I don't know of a good way to define those kinds of search other than to say they are approximately Google-like). This certainly has been an assumption driving OLAC for quite some time. The problem, as I see it, is that nowhere in the OLAC documents (that I'm aware of) is this assumption explicitly laid out. Perhaps I'm the only one who reads these things, but the Mission statement (pasted here) doesn't even explicitly talk about search at all: "OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources." Since the current granularity recommendations are only indirectly connected to the Mission, it would be nice if the relevant rationale for them were given. In fact, to be honest, I'm not sure what the rationale is precisely since I can imagine two fairly distinct ones: (i) that OLAC's mission has changed and its primary focus is to serve as a bridge between linguistic repositories and digital library initiatives like OAI (an excellent mission, if more limited than the current one) or (ii) that OLAC has determined that the most useful step it can make towards its ultimate mission at present is to facilitate language resource discovery in an OAI context. While clarifying this issue is perhaps not all that important to move forward with current work, it obviously could be pretty important down the road, in particular as search technologies and our ideas about what we want to search for and how we want to do it change. Jeff On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > Dear implementers, > > This is a reminder that we have one week left in the review period > for the > documents listed in the attached message. We are anxiously > awaiting your > feedback! > > So far we have gotten just one comment, namely, from Jeff Good > asking about > the possibility of using a solution like the following for > isTranscriptOf > and hasTranscript: > > > > Such a solution would be possible, but since isTranscriptOf is > analogous to > isVersionOf (and the other refinements of dc:relation), it really > should be > a new element (in the olac namespace) that is defined as a > refinement of > dc:relation, which would also enable it to take the encoding schemes > that > dc:relations take, e.g. > > > > This "proper" solution takes us beyond conformance to the current XML > schema for qualified Dublin Core, so our thinking is that we don't > want to > implement a change like that, but rather wait for the revision of > the XML > schema for qualified DC (due out this year) that will support such > extensions. We are also not keen to go to all the work of defining and > implementing the olac:lingrelations extension (which includes > writing a > document and putting it through the stages of the review process) > for a > short-lived temporary solution. Thus, we have these new refinements > on the > list of changes for version 2.0 of our metadata format. > > -Gary > > > > > > Gary Simons > > ORG> To > Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST > Implementers List LIST.ORG > IMPLEMENTER cc > S at LISTSERV.LINGUI > STLIST.ORG> > Subject > Call for review of new metadata > documents > 03/05/2008 10:35 > PM > > > Please respond to > Open Language > Archives > Community > Implementers List > S at LISTSERV.LINGUI > STLIST.ORG> > > > > > > > Dear implementers, > > Many of you also subscribe to the OLAC-GENERAL list and so have > gotten the > general announcement about this call for review for new metadata > documents. > Those of you who have implemented an OLAC data provider are directly > affected since this new work focuses on ways of improving the > quality of > the > metadata in our implementations. In this message we repeat the > general > announcement for the benefit of those not subscribed to OLAC- > GENERAL, and > then we supply further information that is relevant to you as > implementers. > > Six months ago the US National Science Foundation awarded funding > for a > project named "OLAC: Accessing the World's Language Resources" which > aims > to > greatly improve access to language resources for linguists and the > broader > communities of interest. If you are interested in learning more > about the > project, you may visit the project home page at: > > http://olac.wiki.sourceforge.net/ > > In the first phase of the project we are focusing on improving > metadata > quality as a prerequisite to improving the quality of search. To > that end > we have drafted some new documents that can serve as a basis for > improving > and measuring metadata quality within our community: > > Best Practice Recommendations for Language Resource Description > http://www.language-archives.org/REC/bpr.html > > OLAC Metadata Usage Guidelines > http://www.language-archives.org/NOTE/usage.html > > OLAC Metadata Quality Metrics > http://www.language-archives.org/NOTE/metrics.html > > These documents have been reviewed in Draft status by the Metadata > Working > Group. After significant revision, they are now promoted to Proposed > status > and are thus ready for review by the entire community. In keeping > with the > OLAC Process standard, we hereby make a formal call for review. The > review > period will end on MARCH 31, at which point all of the comments that > have > been received will be processed to create revised versions of the > documents. > You may submit comments by simply replying to this message. general > announcement> > > The OLAC Metadata Standard that you followed in implementing your > repository > defines the constraints on validity for a metadata record, but it > gives no > advice about what a high quality metadata record is like. The first > two > documents listed above address this issue. Then, in keeping with > the OLAC > core value of "Peer Review", we want to implement a service that will > measure conformance to the recommendations that can be automatically > tested > for. That is the issue addressed by the third document listed above. > > We have implemented the proposed Metadata Quality Score so that you > can see > the implications for your current metadata. (As the documents are > revised > to > express community consensus, the implementation of the metrics will be > updated to match.) The metadata quality analysis as currently > implemented > is > accessible from a test version of the Participating Archives page. > The site > has no links to this test page; it is accessed by entering this URL > in a > browser: > > http://www.language-archives.org/archives-new.php > > Follow the "Sample Record" link for your archive to see the quality > score > for the sample record named in your Identify response, along with > comments > on what can be done to improve the score. Follow the "Metrics" link > to see > the average quality score for the records you are currently providing. > Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), > Centre de > Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data > Repository who are already getting scores around 8 or higher. The > rest of > us have room for significant improvement! > > Eventually, this new Participating Archives page will replace the > one that > is currently accessed from the ARCHIVES link in the OLAC site banner. > However, this will not happen right away. After the current round of > review > and any subsequent revisions, the documents will be put to the OLAC > Council, > who will check the revised documents and promote them to Candidate > status > when they feel they are ready. Next we will issue a call for > implementation > and give at least one month for implementer feedback. Based on that > feedback, final revisions will be made to the satisfaction of the > Council > who will then grant Adopted status. The new Participating Archives > page > will not replace the current one until the new guidelines and > metrics are > adopted. > > This discussion of process is to let you know that you will probably > want > to > plan to update the implementation of your metadata repository some > time > within the next few months. When these new metadata recommendations > and > usage guidelines are officially adopted, the public will be able to > see the > metrics scores for your repository. In the meantime, it is just other > implementers who are seeing them. You need not wait until the > Candidate > call > for implementation to begin implementing changes. As soon as your > updated > repository is harvested, you will see the metrics change. > > Again, the review period will end on MARCH 31, at which point all of > the > comments that have been received will be processed to create revised > versions of the documents. You may submit comments by replying to > the list > (and potentially entering into discussion with other implementers) > or by > mailing them to . That account is handled by > Debbie > Chang, a Masters candidate at the Graduate Institute of Applied > Linguistics > who is the Research Assistant for our project. She will compile a > list of > all the comments (whether submitted to the list or to the project > account), > which the document editors will then be asked to respond to. That > response > will come after the review period closes. > > With a solid foundation based on quality metadata, our grant project > will > be > able to build improved search services and to expand coverage by > attracting > more participating archives and by implementing gateways to other > aggregators. We are grateful for your participation in this venture > and > trust that you share our excitement about its potential. > > Best wishes, > Gary & Steven > > _______ > Steven Bird, University of Melbourne and University of Pennsylvania > Gary Simons, SIL International and GIAL > OLAC Coordinators (www.language-archives.org) > > From jcgood at BUFFALO.EDU Tue Mar 25 23:54:16 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Tue, 25 Mar 2008 19:54:16 -0400 Subject: Specifying content of elements specifying languages In-Reply-To: Message-ID: Dear Gary (and others), I also wanted to re-raise an issue regarding the nature of the text content of elements specifying the languages of a resource (though the issue relates primarily to subject languages, not the description languages). The resolution is given described as follows, emphasis added: "There are two different issues here. The first regards clarifying the nature of the text content. The first comment points out that there are two main uses of the text content (for an alternate name or for a variety name) and asks if there should be a way to distinguish these two cases. *This can be done by means of the wording of the text content; the most straightforward approach is to add the word "dialect" after the name in the case of a variety name.* An example like this is given in the document. Other cases of indicating a variety, such as "Women's speech," don't involve a name at all and so do not pose a problem. Still other cases of using the content (like the note that Heidi Johnson gives above as an example) can include multiple names, including both alternate names and variety names." I was actually hoping for something stronger than what's given in the highlighted (with "**") sentence. It's been possible for some time to specify any number of fine-grained details in the element content. The issue that concerned me was that there seem to be a few kinds of language refinement likely to be sufficiently frequent that it may make sense to have standardized conventions for encoding them. "Dialect" is the most obvious one. Of course, if there aren't standardized codes, it would be hard for people to search for dialects, but it would be good if there was a standardized way for one to see that a resource represents some dialect. For large languages, people might care a bit about what dialect they're getting, for example. I can think of two ways to do such standardization, one easier than the other. The easy way is just to stipulate how to say a name refers to a dialect in the element content. This could be as simple as saying the name should be followed by the word "dialect" (as opposed to, say, being followed be "variety" or being preceded by "dialect: "). The second would be to add a possible refinement attribute, let's call it, olac:refinement with a controlled vocabulary consisting of, for example, "dialect" and "alternate". Thus, we would adapt this guidelines example: Saracatsan dialect To this: Saracatsan I don't know the DC restrictions well enough to know if this is appropriate. Maybe it falls under the rubric of qualified Dublin Core, in which case nothing can be easily done right now. Jeff From hdry at LINGUISTLIST.ORG Wed Mar 26 13:12:18 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Wed, 26 Mar 2008 09:12:18 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <78BA080F-24E5-4A23-BD3D-1AF352B40436@buffalo.edu> Message-ID: Hello, Gary (and all), I just wanted to second both Jeff's points. I realize that I have always assumed that OLAC metadata is designed to facilitate resource discovery, not full resource description (which might be left to the IMDI metadata set, or another more elaborated standard). I recall mentions of the fact that a researcher typically wants to find anything written on an endangered language, so just knowing the language code of some resources may be adequate. And other discussions seem to have assumed that an archive will most likely not use OCAC metadata as its primary metadata set, but rather export a subset of its descriptive and technical and administrative metadata in OLAC format. It seems to me we routinely talk of OLAC as though its primary purpose is resource discovery. This is a perfectly justifiable and reasonable mission, as Jeff notes below. It has the advantage of (a) being doable and (b) filling a niche. But I do think the mission statement should reflect it. Such clarity would be helpful to those of us who routinely try to promote OLAC. Even within the context of resource discovery, however, 'hasTranscript' would seem to be an important descriptor. In a typical linguist's collection, where nine-tenths of the recordings have not been transcribed, 'hasTranscript' would distinguish those that another researcher would most want to find. I can understand your not wanting to do a lot of work to produce a temporary solution, of course. But this is something that has been frequently requested, so maybe OLAC could put it on some 'must-do' list. And thank you for all the work you and Joan and Steven are doing. I think the OLAC Users' Guide is a very helpful and well-conceived document. All the best from snowy Michigan. -Helen Jeff Good wrote: > Dear Gary, > > Thanks for the clarification regarding the Relation element. It's too > bad we're stuck waiting for DC to finish its process. Would it make > sense for us to start the document process for this refinement before > they officially release the new schema for qualified Dublin Core? Then > we could take advantage of it quickly once it's official. > > I have one other question about the new documents at this point, again > regarding granularity. The new discussion I think is quite welcome and > sufficiently detailed and clear to be put to use. I still find that a > bit of context is missing though. The background assumption seems to be > that OLAC metadata is intended for certain kinds of search (I don't know > of a good way to define those kinds of search other than to say they are > approximately Google-like). This certainly has been an assumption > driving OLAC for quite some time. The problem, as I see it, is that > nowhere in the OLAC documents (that I'm aware of) is this assumption > explicitly laid out. > > Perhaps I'm the only one who reads these things, but the Mission > statement (pasted here) doesn't even explicitly talk about search at all: > > "OLAC, the Open Language Archives Community, is an international > partnership of institutions and individuals who are creating a worldwide > virtual library of language resources by: (i) developing consensus on > best current practice for the digital archiving of language resources, > and (ii) developing a network of interoperating repositories and > services for housing and accessing such resources." > > Since the current granularity recommendations are only indirectly > connected to the Mission, it would be nice if the relevant rationale for > them were given. In fact, to be honest, I'm not sure what the rationale > is precisely since I can imagine two fairly distinct ones: (i) that > OLAC's mission has changed and its primary focus is to serve as a bridge > between linguistic repositories and digital library initiatives like OAI > (an excellent mission, if more limited than the current one) or (ii) > that OLAC has determined that the most useful step it can make towards > its ultimate mission at present is to facilitate language resource > discovery in an OAI context. > > While clarifying this issue is perhaps not all that important to move > forward with current work, it obviously could be pretty important down > the road, in particular as search technologies and our ideas about what > we want to search for and how we want to do it change. > > Jeff > > > > > On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > >> Dear implementers, >> >> This is a reminder that we have one week left in the review period for >> the >> documents listed in the attached message. We are anxiously awaiting >> your >> feedback! >> >> So far we have gotten just one comment, namely, from Jeff Good asking >> about >> the possibility of using a solution like the following for isTranscriptOf >> and hasTranscript: >> >> >> >> Such a solution would be possible, but since isTranscriptOf is >> analogous to >> isVersionOf (and the other refinements of dc:relation), it really >> should be >> a new element (in the olac namespace) that is defined as a refinement of >> dc:relation, which would also enable it to take the encoding schemes that >> dc:relations take, e.g. >> >> >> >> This "proper" solution takes us beyond conformance to the current XML >> schema for qualified Dublin Core, so our thinking is that we don't >> want to >> implement a change like that, but rather wait for the revision of the XML >> schema for qualified DC (due out this year) that will support such >> extensions. We are also not keen to go to all the work of defining and >> implementing the olac:lingrelations extension (which includes writing a >> document and putting it through the stages of the review process) for a >> short-lived temporary solution. Thus, we have these new refinements on >> the >> list of changes for version 2.0 of our metadata format. >> >> -Gary >> >> >> >> >> >> Gary Simons >> > ORG> To >> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >> Implementers List LIST.ORG >> > S at LISTSERV.LINGUI >> STLIST.ORG> Subject >> Call for review of new metadata >> documents >> 03/05/2008 10:35 >> PM >> >> >> Please respond to >> Open Language >> Archives >> Community >> Implementers List >> > S at LISTSERV.LINGUI >> STLIST.ORG> >> >> >> >> >> >> >> Dear implementers, >> >> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >> the >> general announcement about this call for review for new metadata >> documents. >> Those of you who have implemented an OLAC data provider are directly >> affected since this new work focuses on ways of improving the quality of >> the >> metadata in our implementations. In this message we repeat the general >> announcement for the benefit of those not subscribed to OLAC-GENERAL, and >> then we supply further information that is relevant to you as >> implementers. >> >> Six months ago the US National Science Foundation awarded funding for a >> project named "OLAC: Accessing the World's Language Resources" which aims >> to >> greatly improve access to language resources for linguists and the >> broader >> communities of interest. If you are interested in learning more about the >> project, you may visit the project home page at: >> >> http://olac.wiki.sourceforge.net/ >> >> In the first phase of the project we are focusing on improving metadata >> quality as a prerequisite to improving the quality of search. To that >> end >> we have drafted some new documents that can serve as a basis for >> improving >> and measuring metadata quality within our community: >> >> Best Practice Recommendations for Language Resource Description >> http://www.language-archives.org/REC/bpr.html >> >> OLAC Metadata Usage Guidelines >> http://www.language-archives.org/NOTE/usage.html >> >> OLAC Metadata Quality Metrics >> http://www.language-archives.org/NOTE/metrics.html >> >> These documents have been reviewed in Draft status by the Metadata >> Working >> Group. After significant revision, they are now promoted to Proposed >> status >> and are thus ready for review by the entire community. In keeping with >> the >> OLAC Process standard, we hereby make a formal call for review. The >> review >> period will end on MARCH 31, at which point all of the comments that have >> been received will be processed to create revised versions of the >> documents. >> You may submit comments by simply replying to this message. > general >> announcement> >> >> The OLAC Metadata Standard that you followed in implementing your >> repository >> defines the constraints on validity for a metadata record, but it >> gives no >> advice about what a high quality metadata record is like. The first two >> documents listed above address this issue. Then, in keeping with the >> OLAC >> core value of "Peer Review", we want to implement a service that will >> measure conformance to the recommendations that can be automatically >> tested >> for. That is the issue addressed by the third document listed above. >> >> We have implemented the proposed Metadata Quality Score so that you >> can see >> the implications for your current metadata. (As the documents are revised >> to >> express community consensus, the implementation of the metrics will be >> updated to match.) The metadata quality analysis as currently implemented >> is >> accessible from a test version of the Participating Archives page. The >> site >> has no links to this test page; it is accessed by entering this URL in a >> browser: >> >> http://www.language-archives.org/archives-new.php >> >> Follow the "Sample Record" link for your archive to see the quality score >> for the sample record named in your Identify response, along with >> comments >> on what can be done to improve the score. Follow the "Metrics" link to >> see >> the average quality score for the records you are currently providing. >> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >> Repository who are already getting scores around 8 or higher. The >> rest of >> us have room for significant improvement! >> >> Eventually, this new Participating Archives page will replace the one >> that >> is currently accessed from the ARCHIVES link in the OLAC site banner. >> However, this will not happen right away. After the current round of >> review >> and any subsequent revisions, the documents will be put to the OLAC >> Council, >> who will check the revised documents and promote them to Candidate status >> when they feel they are ready. Next we will issue a call for >> implementation >> and give at least one month for implementer feedback. Based on that >> feedback, final revisions will be made to the satisfaction of the Council >> who will then grant Adopted status. The new Participating Archives page >> will not replace the current one until the new guidelines and metrics are >> adopted. >> >> This discussion of process is to let you know that you will probably want >> to >> plan to update the implementation of your metadata repository some time >> within the next few months. When these new metadata recommendations and >> usage guidelines are officially adopted, the public will be able to >> see the >> metrics scores for your repository. In the meantime, it is just other >> implementers who are seeing them. You need not wait until the Candidate >> call >> for implementation to begin implementing changes. As soon as your >> updated >> repository is harvested, you will see the metrics change. >> >> Again, the review period will end on MARCH 31, at which point all of the >> comments that have been received will be processed to create revised >> versions of the documents. You may submit comments by replying to the >> list >> (and potentially entering into discussion with other implementers) or by >> mailing them to . That account is handled by >> Debbie >> Chang, a Masters candidate at the Graduate Institute of Applied >> Linguistics >> who is the Research Assistant for our project. She will compile a >> list of >> all the comments (whether submitted to the list or to the project >> account), >> which the document editors will then be asked to respond to. That >> response >> will come after the review period closes. >> >> With a solid foundation based on quality metadata, our grant project will >> be >> able to build improved search services and to expand coverage by >> attracting >> more participating archives and by implementing gateways to other >> aggregators. We are grateful for your participation in this venture and >> trust that you share our excitement about its potential. >> >> Best wishes, >> Gary & Steven >> >> _______ >> Steven Bird, University of Melbourne and University of Pennsylvania >> Gary Simons, SIL International and GIAL >> OLAC Coordinators (www.language-archives.org) >> >> -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From Gary_Simons at SIL.ORG Fri Mar 28 04:15:51 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Thu, 27 Mar 2008 23:15:51 -0500 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <47EA4BB2.5030201@linguistlist.org> Message-ID: Jeff and Helen, I do think the mission statement speaks to the issue you are asking about, but it is clearly implicit and wrapped up in what I hope is a shared understanding of the term "library". As the mission statement says, the purpose of OLAC is to "create a virtual library of language resources." A simplistic model of what a library does is that it: (1) builds a collection of resources, (2) curates that collection over the long term, and (3) maintains a catalog that helps its users find the resources that are relevant to them. Since OLAC is a virtual library it doesn't need to do point (2) of curating a collection--each of our participating archives is doing that for their piece of the virtual collection. But OLAC is (1) building the virtual collection by recruiting more archives to participate and (as our current grant project unfolds) developing gateways to other aggregated catalogs, and (3) maintaining a catalog to help users find resources. The OLAC metadata standard is, of course, the specification for how to create an entry for the catalog. The discovery goals that the granularity guidelines reflect are based on what we would typically expect from a library catalog. When it comes to books, for instance, the library catalog helps us find a book that we can judge to be potentially relevant based on title and author and subject and the like, but it does not give us the detailed table of contents. We have to open the book to find that. Similarly, the catalog for a library or archive typically treats something like a collection of field notes and recordings (that have the same provenance) as a single item (which is why we have DCMI type Collection). If the catalog record sounds relevant, then we have to open the collection to find the detailed table of contents. In library cataloging practice, it is the collection that is analogous to a book, rather than an individual recorded session. Thus I think this interpretation of desired granularity is straightforwardly implied by the OLAC mission of creating a virtual library. If you have any ideas of specific wording changes in the granularity guidelines that might help to clarify this, I'll be glad to hear them. The current growth edge of work in the OAI is on developing a standard for describing the detailed contents of a compound object in an interoperable way. It is called OAI-ORE (for Object Reuse and Exchange), currently released in an alpha version: http://www.openarchives.org/ore/ It does not replace the OAI_DC description or change the basic catalog. Rather, when available, it is a second description of a resource that identifies all of its components and how they function and relate to each other. It can be used to implement services that make more intelligent use of resources. Once we have done a conversion to the new style of qualified DC description and once the OAI-ORE spec is established, we may well want to work on OLAC guidelines for applying OAI-ORE so that we can intelligently handle compound objects. Best, -Gary Helen Aristar-Dry To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Re: Reminder: Call for review of new metadata documents 03/26/2008 08:12 AM Please respond to Open Language Archives Community Implementers List Hello, Gary (and all), I just wanted to second both Jeff's points. I realize that I have always assumed that OLAC metadata is designed to facilitate resource discovery, not full resource description (which might be left to the IMDI metadata set, or another more elaborated standard). I recall mentions of the fact that a researcher typically wants to find anything written on an endangered language, so just knowing the language code of some resources may be adequate. And other discussions seem to have assumed that an archive will most likely not use OCAC metadata as its primary metadata set, but rather export a subset of its descriptive and technical and administrative metadata in OLAC format. It seems to me we routinely talk of OLAC as though its primary purpose is resource discovery. This is a perfectly justifiable and reasonable mission, as Jeff notes below. It has the advantage of (a) being doable and (b) filling a niche. But I do think the mission statement should reflect it. Such clarity would be helpful to those of us who routinely try to promote OLAC. Even within the context of resource discovery, however, 'hasTranscript' would seem to be an important descriptor. In a typical linguist's collection, where nine-tenths of the recordings have not been transcribed, 'hasTranscript' would distinguish those that another researcher would most want to find. I can understand your not wanting to do a lot of work to produce a temporary solution, of course. But this is something that has been frequently requested, so maybe OLAC could put it on some 'must-do' list. And thank you for all the work you and Joan and Steven are doing. I think the OLAC Users' Guide is a very helpful and well-conceived document. All the best from snowy Michigan. -Helen Jeff Good wrote: > Dear Gary, > > Thanks for the clarification regarding the Relation element. It's too > bad we're stuck waiting for DC to finish its process. Would it make > sense for us to start the document process for this refinement before > they officially release the new schema for qualified Dublin Core? Then > we could take advantage of it quickly once it's official. > > I have one other question about the new documents at this point, again > regarding granularity. The new discussion I think is quite welcome and > sufficiently detailed and clear to be put to use. I still find that a > bit of context is missing though. The background assumption seems to be > that OLAC metadata is intended for certain kinds of search (I don't know > of a good way to define those kinds of search other than to say they are > approximately Google-like). This certainly has been an assumption > driving OLAC for quite some time. The problem, as I see it, is that > nowhere in the OLAC documents (that I'm aware of) is this assumption > explicitly laid out. > > Perhaps I'm the only one who reads these things, but the Mission > statement (pasted here) doesn't even explicitly talk about search at all: > > "OLAC, the Open Language Archives Community, is an international > partnership of institutions and individuals who are creating a worldwide > virtual library of language resources by: (i) developing consensus on > best current practice for the digital archiving of language resources, > and (ii) developing a network of interoperating repositories and > services for housing and accessing such resources." > > Since the current granularity recommendations are only indirectly > connected to the Mission, it would be nice if the relevant rationale for > them were given. In fact, to be honest, I'm not sure what the rationale > is precisely since I can imagine two fairly distinct ones: (i) that > OLAC's mission has changed and its primary focus is to serve as a bridge > between linguistic repositories and digital library initiatives like OAI > (an excellent mission, if more limited than the current one) or (ii) > that OLAC has determined that the most useful step it can make towards > its ultimate mission at present is to facilitate language resource > discovery in an OAI context. > > While clarifying this issue is perhaps not all that important to move > forward with current work, it obviously could be pretty important down > the road, in particular as search technologies and our ideas about what > we want to search for and how we want to do it change. > > Jeff > > > > > On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > >> Dear implementers, >> >> This is a reminder that we have one week left in the review period for >> the >> documents listed in the attached message. We are anxiously awaiting >> your >> feedback! >> >> So far we have gotten just one comment, namely, from Jeff Good asking >> about >> the possibility of using a solution like the following for isTranscriptOf >> and hasTranscript: >> >> >> >> Such a solution would be possible, but since isTranscriptOf is >> analogous to >> isVersionOf (and the other refinements of dc:relation), it really >> should be >> a new element (in the olac namespace) that is defined as a refinement of >> dc:relation, which would also enable it to take the encoding schemes that >> dc:relations take, e.g. >> >> >> >> This "proper" solution takes us beyond conformance to the current XML >> schema for qualified Dublin Core, so our thinking is that we don't >> want to >> implement a change like that, but rather wait for the revision of the XML >> schema for qualified DC (due out this year) that will support such >> extensions. We are also not keen to go to all the work of defining and >> implementing the olac:lingrelations extension (which includes writing a >> document and putting it through the stages of the review process) for a >> short-lived temporary solution. Thus, we have these new refinements on >> the >> list of changes for version 2.0 of our metadata format. >> >> -Gary >> >> >> >> >> >> Gary Simons >> > ORG> To >> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >> Implementers List LIST.ORG >> > S at LISTSERV.LINGUI >> STLIST.ORG> Subject >> Call for review of new metadata >> documents >> 03/05/2008 10:35 >> PM >> >> >> Please respond to >> Open Language >> Archives >> Community >> Implementers List >> > S at LISTSERV.LINGUI >> STLIST.ORG> >> >> >> >> >> >> >> Dear implementers, >> >> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >> the >> general announcement about this call for review for new metadata >> documents. >> Those of you who have implemented an OLAC data provider are directly >> affected since this new work focuses on ways of improving the quality of >> the >> metadata in our implementations. In this message we repeat the general >> announcement for the benefit of those not subscribed to OLAC-GENERAL, and >> then we supply further information that is relevant to you as >> implementers. >> >> Six months ago the US National Science Foundation awarded funding for a >> project named "OLAC: Accessing the World's Language Resources" which aims >> to >> greatly improve access to language resources for linguists and the >> broader >> communities of interest. If you are interested in learning more about the >> project, you may visit the project home page at: >> >> http://olac.wiki.sourceforge.net/ >> >> In the first phase of the project we are focusing on improving metadata >> quality as a prerequisite to improving the quality of search. To that >> end >> we have drafted some new documents that can serve as a basis for >> improving >> and measuring metadata quality within our community: >> >> Best Practice Recommendations for Language Resource Description >> http://www.language-archives.org/REC/bpr.html >> >> OLAC Metadata Usage Guidelines >> http://www.language-archives.org/NOTE/usage.html >> >> OLAC Metadata Quality Metrics >> http://www.language-archives.org/NOTE/metrics.html >> >> These documents have been reviewed in Draft status by the Metadata >> Working >> Group. After significant revision, they are now promoted to Proposed >> status >> and are thus ready for review by the entire community. In keeping with >> the >> OLAC Process standard, we hereby make a formal call for review. The >> review >> period will end on MARCH 31, at which point all of the comments that have >> been received will be processed to create revised versions of the >> documents. >> You may submit comments by simply replying to this message. > general >> announcement> >> >> The OLAC Metadata Standard that you followed in implementing your >> repository >> defines the constraints on validity for a metadata record, but it >> gives no >> advice about what a high quality metadata record is like. The first two >> documents listed above address this issue. Then, in keeping with the >> OLAC >> core value of "Peer Review", we want to implement a service that will >> measure conformance to the recommendations that can be automatically >> tested >> for. That is the issue addressed by the third document listed above. >> >> We have implemented the proposed Metadata Quality Score so that you >> can see >> the implications for your current metadata. (As the documents are revised >> to >> express community consensus, the implementation of the metrics will be >> updated to match.) The metadata quality analysis as currently implemented >> is >> accessible from a test version of the Participating Archives page. The >> site >> has no links to this test page; it is accessed by entering this URL in a >> browser: >> >> http://www.language-archives.org/archives-new.php >> >> Follow the "Sample Record" link for your archive to see the quality score >> for the sample record named in your Identify response, along with >> comments >> on what can be done to improve the score. Follow the "Metrics" link to >> see >> the average quality score for the records you are currently providing. >> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >> Repository who are already getting scores around 8 or higher. The >> rest of >> us have room for significant improvement! >> >> Eventually, this new Participating Archives page will replace the one >> that >> is currently accessed from the ARCHIVES link in the OLAC site banner. >> However, this will not happen right away. After the current round of >> review >> and any subsequent revisions, the documents will be put to the OLAC >> Council, >> who will check the revised documents and promote them to Candidate status >> when they feel they are ready. Next we will issue a call for >> implementation >> and give at least one month for implementer feedback. Based on that >> feedback, final revisions will be made to the satisfaction of the Council >> who will then grant Adopted status. The new Participating Archives page >> will not replace the current one until the new guidelines and metrics are >> adopted. >> >> This discussion of process is to let you know that you will probably want >> to >> plan to update the implementation of your metadata repository some time >> within the next few months. When these new metadata recommendations and >> usage guidelines are officially adopted, the public will be able to >> see the >> metrics scores for your repository. In the meantime, it is just other >> implementers who are seeing them. You need not wait until the Candidate >> call >> for implementation to begin implementing changes. As soon as your >> updated >> repository is harvested, you will see the metrics change. >> >> Again, the review period will end on MARCH 31, at which point all of the >> comments that have been received will be processed to create revised >> versions of the documents. You may submit comments by replying to the >> list >> (and potentially entering into discussion with other implementers) or by >> mailing them to . That account is handled by >> Debbie >> Chang, a Masters candidate at the Graduate Institute of Applied >> Linguistics >> who is the Research Assistant for our project. She will compile a >> list of >> all the comments (whether submitted to the list or to the project >> account), >> which the document editors will then be asked to respond to. That >> response >> will come after the review period closes. >> >> With a solid foundation based on quality metadata, our grant project will >> be >> able to build improved search services and to expand coverage by >> attracting >> more participating archives and by implementing gateways to other >> aggregators. We are grateful for your participation in this venture and >> trust that you share our excitement about its potential. >> >> Best wishes, >> Gary & Steven >> >> _______ >> Steven Bird, University of Melbourne and University of Pennsylvania >> Gary Simons, SIL International and GIAL >> OLAC Coordinators (www.language-archives.org) >> >> -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From hdry at LINGUISTLIST.ORG Fri Mar 28 14:19:12 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Fri, 28 Mar 2008 10:19:12 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: That's a great explanation, Gary; and I'll buy everything in it except that it is "straightforwardly implied"! (Great oxymoronic phrase!) Seriously, I will buy this reasoning when I read it, as below; and, as you know, I think that resource discovery IS the right focus for OLAC. But I don't think that everyone gets that from the mission statement. Why don't you just add part of the explanation below to the mission statement and clarify it for everyone. It could read "... create a virtual library of language resources through building a collection and maintaining a resource catalog. OLAC is (1) building a virtual collection through archive recruitment and the development of gateways to other aggregated catalogs and (2) maintaining an online metadata catalog to aid in resource discovery. The OLAC metadata standard is the specification for creating an entry for the catalog." But maybe Jeff will have a better idea. Thanks, -Helen Gary Simons wrote: > Jeff and Helen, > > I do think the mission statement speaks to the issue you are asking about, > but it is clearly implicit and wrapped up in what I hope is a shared > understanding of the term "library". As the mission statement says, the > purpose of OLAC is to "" > > A simplistic model of what a library does is that it: (1) builds a > collection of resources, (2) curates that collection over the long term, > and (3) maintains a catalog that helps its users find the resources that > are relevant to them. Since OLAC is a virtual library it doesn't need to > do point (2) of curating a collection--each of our participating archives > is doing that for their piece of the virtual collection. But OLAC is (1) > building the virtual collection by recruiting more archives to participate > and (as our current grant project unfolds) developing gateways to other > aggregated catalogs, and (3) maintaining a catalog to help users find > resources. The OLAC metadata standard is, of course, the specification for > how to create an entry for the catalog. > > The discovery goals that the granularity guidelines reflect are based on > what we would typically expect from a library catalog. When it comes to > books, for instance, the library catalog helps us find a book that we can > judge to be potentially relevant based on title and author and subject and > the like, but it does not give us the detailed table of contents. We have > to open the book to find that. Similarly, the catalog for a library or > archive typically treats something like a collection of field notes and > recordings (that have the same provenance) as a single item (which is why > we have DCMI type Collection). If the catalog record sounds relevant, then > we have to open the collection to find the detailed table of contents. In > library cataloging practice, it is the collection that is analogous to a > book, rather than an individual recorded session. Thus I think this > interpretation of desired granularity is straightforwardly implied by the > OLAC mission of creating a virtual library. If you have any ideas of > specific wording changes in the granularity guidelines that might help to > clarify this, I'll be glad to hear them. > > The current growth edge of work in the OAI is on developing a standard for > describing the detailed contents of a compound object in an interoperable > way. It is called OAI-ORE (for Object Reuse and Exchange), currently > released in an alpha version: > > http://www.openarchives.org/ore/ > > It does not replace the OAI_DC description or change the basic catalog. > Rather, when available, it is a second description of a resource that > identifies all of its components and how they function and relate to each > other. It can be used to implement services that make more intelligent use > of resources. Once we have done a conversion to the new style of qualified > DC description and once the OAI-ORE spec is established, we may well want > to work on OLAC guidelines for applying OAI-ORE so that we can > intelligently handle compound objects. > > Best, > -Gary > > > > > > > Helen Aristar-Dry > T.ORG> To > Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST > Implementers List LIST.ORG > S at LISTSERV.LINGUI > STLIST.ORG> Subject > Re: Reminder: Call for review of > new metadata documents > 03/26/2008 08:12 > AM > > > Please respond to > Open Language > Archives > Community > Implementers List > S at LISTSERV.LINGUI > STLIST.ORG> > > > > > > > Hello, Gary (and all), > > I just wanted to second both Jeff's points. I realize that I have > always assumed that OLAC metadata is designed to facilitate resource > discovery, not full resource description (which might be left to the > IMDI metadata set, or another more elaborated standard). I recall > mentions of the fact that a researcher typically wants to find anything > written on an endangered language, so just knowing the language code of > some resources may be adequate. And other discussions seem to have > assumed that an archive will most likely not use OCAC metadata as its > primary metadata set, but rather export a subset of its descriptive and > technical and administrative metadata in OLAC format. It seems to me > we routinely talk of OLAC as though its primary purpose is resource > discovery. This is a perfectly justifiable and reasonable mission, as > Jeff notes below. It has the advantage of (a) being doable and (b) > filling a niche. But I do think the mission statement should reflect > it. Such clarity would be helpful to those of us who routinely try to > promote OLAC. > > Even within the context of resource discovery, however, 'hasTranscript' > would seem to be an important descriptor. In a typical linguist's > collection, where nine-tenths of the recordings have not been > transcribed, 'hasTranscript' would distinguish those that another > researcher would most want to find. I can understand your not wanting > to do a lot of work to produce a temporary solution, of course. But > this is something that has been frequently requested, so maybe OLAC > could put it on some 'must-do' list. > > And thank you for all the work you and Joan and Steven are doing. I > think the OLAC Users' Guide is a very helpful and well-conceived document. > > All the best from snowy Michigan. > -Helen > > Jeff Good wrote: >> Dear Gary, >> >> Thanks for the clarification regarding the Relation element. It's too >> bad we're stuck waiting for DC to finish its process. Would it make >> sense for us to start the document process for this refinement before >> they officially release the new schema for qualified Dublin Core? Then >> we could take advantage of it quickly once it's official. >> >> I have one other question about the new documents at this point, again >> regarding granularity. The new discussion I think is quite welcome and >> sufficiently detailed and clear to be put to use. I still find that a >> bit of context is missing though. The background assumption seems to be >> that OLAC metadata is intended for certain kinds of search (I don't know >> of a good way to define those kinds of search other than to say they are >> approximately Google-like). This certainly has been an assumption >> driving OLAC for quite some time. The problem, as I see it, is that >> nowhere in the OLAC documents (that I'm aware of) is this assumption >> explicitly laid out. >> >> Perhaps I'm the only one who reads these things, but the Mission >> statement (pasted here) doesn't even explicitly talk about search at all: >> >> "OLAC, the Open Language Archives Community, is an international >> partnership of institutions and individuals who are creating a worldwide >> virtual library of language resources by: (i) developing consensus on >> best current practice for the digital archiving of language resources, >> and (ii) developing a network of interoperating repositories and >> services for housing and accessing such resources." >> >> Since the current granularity recommendations are only indirectly >> connected to the Mission, it would be nice if the relevant rationale for >> them were given. In fact, to be honest, I'm not sure what the rationale >> is precisely since I can imagine two fairly distinct ones: (i) that >> OLAC's mission has changed and its primary focus is to serve as a bridge >> between linguistic repositories and digital library initiatives like OAI >> (an excellent mission, if more limited than the current one) or (ii) >> that OLAC has determined that the most useful step it can make towards >> its ultimate mission at present is to facilitate language resource >> discovery in an OAI context. >> >> While clarifying this issue is perhaps not all that important to move >> forward with current work, it obviously could be pretty important down >> the road, in particular as search technologies and our ideas about what >> we want to search for and how we want to do it change. >> >> Jeff >> >> >> >> >> On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: >> >>> Dear implementers, >>> >>> This is a reminder that we have one week left in the review period for >>> the >>> documents listed in the attached message. We are anxiously awaiting >>> your >>> feedback! >>> >>> So far we have gotten just one comment, namely, from Jeff Good asking >>> about >>> the possibility of using a solution like the following for > isTranscriptOf >>> and hasTranscript: >>> >>> >>> >>> Such a solution would be possible, but since isTranscriptOf is >>> analogous to >>> isVersionOf (and the other refinements of dc:relation), it really >>> should be >>> a new element (in the olac namespace) that is defined as a refinement of >>> dc:relation, which would also enable it to take the encoding schemes > that >>> dc:relations take, e.g. >>> >>> >>> >>> This "proper" solution takes us beyond conformance to the current XML >>> schema for qualified Dublin Core, so our thinking is that we don't >>> want to >>> implement a change like that, but rather wait for the revision of the > XML >>> schema for qualified DC (due out this year) that will support such >>> extensions. We are also not keen to go to all the work of defining and >>> implementing the olac:lingrelations extension (which includes writing a >>> document and putting it through the stages of the review process) for a >>> short-lived temporary solution. Thus, we have these new refinements on >>> the >>> list of changes for version 2.0 of our metadata format. >>> >>> -Gary >>> >>> >>> >>> >>> >>> Gary Simons >>> >> ORG> To >>> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >>> Implementers List LIST.ORG >>> >> S at LISTSERV.LINGUI >>> STLIST.ORG> Subject >>> Call for review of new metadata >>> documents >>> 03/05/2008 10:35 >>> PM >>> >>> >>> Please respond to >>> Open Language >>> Archives >>> Community >>> Implementers List >>> >> S at LISTSERV.LINGUI >>> STLIST.ORG> >>> >>> >>> >>> >>> >>> >>> Dear implementers, >>> >>> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >>> the >>> general announcement about this call for review for new metadata >>> documents. >>> Those of you who have implemented an OLAC data provider are directly >>> affected since this new work focuses on ways of improving the quality of >>> the >>> metadata in our implementations. In this message we repeat the general >>> announcement for the benefit of those not subscribed to OLAC-GENERAL, > and >>> then we supply further information that is relevant to you as >>> implementers. >>> >>> Six months ago the US National Science Foundation awarded funding for a >>> project named "OLAC: Accessing the World's Language Resources" which > aims >>> to >>> greatly improve access to language resources for linguists and the >>> broader >>> communities of interest. If you are interested in learning more about > the >>> project, you may visit the project home page at: >>> >>> http://olac.wiki.sourceforge.net/ >>> >>> In the first phase of the project we are focusing on improving metadata >>> quality as a prerequisite to improving the quality of search. To that >>> end >>> we have drafted some new documents that can serve as a basis for >>> improving >>> and measuring metadata quality within our community: >>> >>> Best Practice Recommendations for Language Resource Description >>> http://www.language-archives.org/REC/bpr.html >>> >>> OLAC Metadata Usage Guidelines >>> http://www.language-archives.org/NOTE/usage.html >>> >>> OLAC Metadata Quality Metrics >>> http://www.language-archives.org/NOTE/metrics.html >>> >>> These documents have been reviewed in Draft status by the Metadata >>> Working >>> Group. After significant revision, they are now promoted to Proposed >>> status >>> and are thus ready for review by the entire community. In keeping with >>> the >>> OLAC Process standard, we hereby make a formal call for review. The >>> review >>> period will end on MARCH 31, at which point all of the comments that > have >>> been received will be processed to create revised versions of the >>> documents. >>> You may submit comments by simply replying to this message. >> general >>> announcement> >>> >>> The OLAC Metadata Standard that you followed in implementing your >>> repository >>> defines the constraints on validity for a metadata record, but it >>> gives no >>> advice about what a high quality metadata record is like. The first two >>> documents listed above address this issue. Then, in keeping with the >>> OLAC >>> core value of "Peer Review", we want to implement a service that will >>> measure conformance to the recommendations that can be automatically >>> tested >>> for. That is the issue addressed by the third document listed above. >>> >>> We have implemented the proposed Metadata Quality Score so that you >>> can see >>> the implications for your current metadata. (As the documents are > revised >>> to >>> express community consensus, the implementation of the metrics will be >>> updated to match.) The metadata quality analysis as currently > implemented >>> is >>> accessible from a test version of the Participating Archives page. The >>> site >>> has no links to this test page; it is accessed by entering this URL in a >>> browser: >>> >>> http://www.language-archives.org/archives-new.php >>> >>> Follow the "Sample Record" link for your archive to see the quality > score >>> for the sample record named in your Identify response, along with >>> comments >>> on what can be done to improve the score. Follow the "Metrics" link to >>> see >>> the average quality score for the records you are currently providing. >>> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >>> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >>> Repository who are already getting scores around 8 or higher. The >>> rest of >>> us have room for significant improvement! >>> >>> Eventually, this new Participating Archives page will replace the one >>> that >>> is currently accessed from the ARCHIVES link in the OLAC site banner. >>> However, this will not happen right away. After the current round of >>> review >>> and any subsequent revisions, the documents will be put to the OLAC >>> Council, >>> who will check the revised documents and promote them to Candidate > status >>> when they feel they are ready. Next we will issue a call for >>> implementation >>> and give at least one month for implementer feedback. Based on that >>> feedback, final revisions will be made to the satisfaction of the > Council >>> who will then grant Adopted status. The new Participating Archives page >>> will not replace the current one until the new guidelines and metrics > are >>> adopted. >>> >>> This discussion of process is to let you know that you will probably > want >>> to >>> plan to update the implementation of your metadata repository some time >>> within the next few months. When these new metadata recommendations and >>> usage guidelines are officially adopted, the public will be able to >>> see the >>> metrics scores for your repository. In the meantime, it is just other >>> implementers who are seeing them. You need not wait until the Candidate >>> call >>> for implementation to begin implementing changes. As soon as your >>> updated >>> repository is harvested, you will see the metrics change. >>> >>> Again, the review period will end on MARCH 31, at which point all of the >>> comments that have been received will be processed to create revised >>> versions of the documents. You may submit comments by replying to the >>> list >>> (and potentially entering into discussion with other implementers) or by >>> mailing them to . That account is handled by >>> Debbie >>> Chang, a Masters candidate at the Graduate Institute of Applied >>> Linguistics >>> who is the Research Assistant for our project. She will compile a >>> list of >>> all the comments (whether submitted to the list or to the project >>> account), >>> which the document editors will then be asked to respond to. That >>> response >>> will come after the review period closes. >>> >>> With a solid foundation based on quality metadata, our grant project > will >>> be >>> able to build improved search services and to expand coverage by >>> attracting >>> more participating archives and by implementing gateways to other >>> aggregators. We are grateful for your participation in this venture and >>> trust that you share our excitement about its potential. >>> >>> Best wishes, >>> Gary & Steven >>> >>> _______ >>> Steven Bird, University of Melbourne and University of Pennsylvania >>> Gary Simons, SIL International and GIAL >>> OLAC Coordinators (www.language-archives.org) >>> >>> > > -- > Helen Aristar-Dry > Professor of Linguistics > Director, Institute for Language Information and Technology (ILIT) > Eastern Michigan University > 2000 Huron River Rd., Suite 104 > Ypsilanti, MI 48197 > > 734.487.0144 (ILIT office) > 734.487.7952 (faculty office) > 734.482.0132 (fax) > hdry at linguistlist.org -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From jcgood at BUFFALO.EDU Sat Mar 29 20:15:30 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Sat, 29 Mar 2008 16:15:30 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear Gary (and others), Thanks a lot for that clarification. The relationship between the mission statement and the granularity guidelines is much clearer to me now. I agree with Helen that the reasoning should be made explicit somewhere more prominent than this list. Your interpretation is not how I have interpreted the mission statement largely because I missed out on the significance of the word "library". I also find "virtual library" to be ambiguous between what one might call a "digital library" and what one might call an "aggregated library" (the latter sense being my label for your understanding of the OLAC use). I think it might be worth adding two questions to the FAQ (or answering these questions in some other appropriate place): (i) What does OLAC mean by "virtual library"? and (ii) What does OLAC mean by "language archive"? That should help a lot with possible ambiguities in the mission statement. An important open issue, which is still not clear to me from your explanation is whether OLAC is focusing on a "card catalog" now because that's all OLAC ever sees itself doing or if it, instead, views getting the card catalog part right as the first step towards a deeper kind of interoperability. (My reading of the mission would be that the latter interpretation is correct, but I already missed out on the importance of "library" in the mission. So, I'm probably missing several other points. I think the crucial point in this regard is understanding what the level of interoperability one hopes to achieve with respect to the "interoperating repositories".) I don't think this is a merely pedantic issue right now because it matters a lot for how we "advertise" OLAC. Do we say, "OLAC is all about search!" (my simplification of something Helen said) or do we say, "OLAC aims for digital linguistic utopia starting with search!". (For what it's worth, I don't really care strongly about which path OLAC takes, but I would like to be confident I'm describing OLAC's goals correctly to other people.) > aggregated catalogs, and (3) maintaining a catalog to help users find > resources. The OLAC metadata standard is, of course, the > specification for > how to create an entry for the catalog. I'm actually somewhat confused by the fact that you say OLAC is maintaining a catalog of resources. It was my understanding that OLAC is right now only maintaining one kind of "catalog", but not one of resources. Rather, it maintains a list of participating archives. The full catalogs of resources (for linguists, at least) are maintained by the two service providers: LINGUIST and the LDC. (I know there are lots of connections between OLAC and these catalogs, but, strictly speaking, I didn't think OLAC was in the catalog maintenance business but, rather, defined a way through which a catalog could be maintained by outside parties.) > book, rather than an individual recorded session. Thus I think this > interpretation of desired granularity is straightforwardly implied > by the > OLAC mission of creating a virtual library. If you have any ideas of > specific wording changes in the granularity guidelines that might > help to > clarify this, I'll be glad to hear them. I think your response already has all of the required points. Helen seemed to suggest adding an explanation to the mission statement. I'll let you and Steven decide if that's appropriate. (I'm not sure what the process is for adding explanations to the mission statement.) With respect to the guidelines, I recommend changing the first paragraph of the granularity discussion to something like the following (based on my understanding of your explanation): "Determining the right level for units to be described as language resources in the OLAC context involves multiple factors. The level of unit appropriate for inclusion in an aggregated catalog like OLAC's may be different (typically higher) than the level desirable for the catalog of a specific institution's holdings, which in turn is typically higher than the level desirable for describing the detailed contents of a resource. Consistent with its mission to create a virtual _library_ of language resources, a basic rule of thumb for making determinations regarding what kinds of units to treat as language resources should be that they should be comparable to the kinds of units treated as resources in a traditional library catalog. For example, libraries typically assign a single record to each book, not to each chapter within a book. A parallel example in the OLAC context would be treating all the objects associated with a particular field trip as a single unit rather than treating each of the individual resources created during that field trip as separate units. The following discussion is aimed at assisting an OLAC participant to find the right level of description." It might the be nice to give lots of concrete examples, maybe you could get some of the participating archives to do this? One thing I deleted from that paragraph was reference to the recommendation given in the Repository guidelines: "A metadata repository should not degrade the 'signal-to-noise ratio' for language resource discovery." I don't find this recommendation very helpful because (for me, at least) it is too dependent on what kinds of resources I want to discover. In other words, "language resource discovery" is too broad an activity for there to be one "signal-to-noise ratio". For example, if I already know I'm looking for resources on Nahuatl, I would probably not want to find a record saying, "There's a bunch of material on Nahuatl that's part of some bundle over at AILLA." The signal would be too weak for me--what I'd prefer is the search result I'd get from AILLA's catalog. Of course, for the next person, lots of detailed records about Nahuatl would constitute "noise". Signal and noise just don't strike me as constant enough to form the basis of a recommendation. I also don't like that this recommendation privileges language resource _discovery_ over other possible uses of the catalog. For example, library catalogs have at least one other function in addition to discovery: retrieval. Often I know a resource exists, but I don't know where it is, which is why I consult the catalog (this is my primary use of WorldCat, for example). (The word "discovery" is potentially ambiguous enough to cover "find something previously unknown" and "retrieve", but that's not my initial reading.) So, I would prefer a recommendation that was more agnostic regarding the use of the metadata. I personally find your new discussion of provenance in the metadata usage guidelines much more helpful than 'signal-to-noise ratio', since it's not dependent on particular uses of OLAC service providers. So, I'd actually recommend the following revision to the repository guidelines regarding granularity from the present recommendation to something like: "A metadata repository should treat resources with a single provenance as constituting a single unit with respect to OLAC metadata and should, therefore, describe them within a single record." Another advantage to talking about granularity in terms of provenance in my view is that the current guidelines seem to be asking data providers to hypothesize about what search scenarios their data will be put to, but I don't think it's reasonable to expect data providers to be very good at this, or to even to ask them to spend time thinking about this. That's a job for service providers. Framing the issue in terms of provenance allows data providers to use a kind of information they are, in principle, experts about to structure their collections, which is presumably a good way to achieve consistency. Furthermore, it allows service providers to be reasonably confident that they are aggregating records of the same basic kind from different service providers. It is thus more consonant with the overall OAI model wherein data providers and service providers interact in terms of a well-defined series of agreements without the one having to pay attention to the internal activities of the other. Jeff From hdry at LINGUISTLIST.ORG Sun Mar 30 14:20:53 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Sun, 30 Mar 2008 10:20:53 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <08921381-95C5-4081-A5BD-7430FF929B48@buffalo.edu> Message-ID: Extremely sensible remarks, Jeff. I agree especially with the points about 'signal to noise' ratio and think that Gary's remarks on provenance or your revision, which gives an example, would be much more helpful. -Helen Jeff Good wrote: > Dear Gary (and others), > > Thanks a lot for that clarification. The relationship between the > mission statement and the granularity guidelines is much clearer to me > now. I agree with Helen that the reasoning should be made explicit > somewhere more prominent than this list. Your interpretation is not how > I have interpreted the mission statement largely because I missed out on > the significance of the word "library". I also find "virtual library" to > be ambiguous between what one might call a "digital library" and what > one might call an "aggregated library" (the latter sense being my label > for your understanding of the OLAC use). > > I think it might be worth adding two questions to the FAQ (or answering > these questions in some other appropriate place): (i) What does OLAC > mean by "virtual library"? and (ii) What does OLAC mean by "language > archive"? That should help a lot with possible ambiguities in the > mission statement. > > An important open issue, which is still not clear to me from your > explanation is whether OLAC is focusing on a "card catalog" now because > that's all OLAC ever sees itself doing or if it, instead, views getting > the card catalog part right as the first step towards a deeper kind of > interoperability. (My reading of the mission would be that the latter > interpretation is correct, but I already missed out on the importance of > "library" in the mission. So, I'm probably missing several other points. > I think the crucial point in this regard is understanding what the level > of interoperability one hopes to achieve with respect to the > "interoperating repositories".) I don't think this is a merely pedantic > issue right now because it matters a lot for how we "advertise" OLAC. Do > we say, "OLAC is all about search!" (my simplification of something > Helen said) or do we say, "OLAC aims for digital linguistic utopia > starting with search!". (For what it's worth, I don't really care > strongly about which path OLAC takes, but I would like to be confident > I'm describing OLAC's goals correctly to other people.) > > >> aggregated catalogs, and (3) maintaining a catalog to help users find >> resources. The OLAC metadata standard is, of course, the >> specification for >> how to create an entry for the catalog. > > I'm actually somewhat confused by the fact that you say OLAC is > maintaining a catalog of resources. It was my understanding that OLAC is > right now only maintaining one kind of "catalog", but not one of > resources. Rather, it maintains a list of participating archives. The > full catalogs of resources (for linguists, at least) are maintained by > the two service providers: LINGUIST and the LDC. (I know there are lots > of connections between OLAC and these catalogs, but, strictly speaking, > I didn't think OLAC was in the catalog maintenance business but, rather, > defined a way through which a catalog could be maintained by outside > parties.) > > >> book, rather than an individual recorded session. Thus I think this >> interpretation of desired granularity is straightforwardly implied by the >> OLAC mission of creating a virtual library. If you have any ideas of >> specific wording changes in the granularity guidelines that might help to >> clarify this, I'll be glad to hear them. > > I think your response already has all of the required points. Helen > seemed to suggest adding an explanation to the mission statement. I'll > let you and Steven decide if that's appropriate. (I'm not sure what the > process is for adding explanations to the mission statement.) > > With respect to the guidelines, I recommend changing the first paragraph > of the granularity discussion to something like the following (based on > my understanding of your explanation): > > "Determining the right level for units to be described as language > resources in the OLAC context involves multiple factors. The level of > unit appropriate for inclusion in an aggregated catalog like OLAC's may > be different (typically higher) than the level desirable for the catalog > of a specific institution's holdings, which in turn is typically higher > than the level desirable for describing the detailed contents of a > resource. Consistent with its mission to create a virtual _library_ of > language resources, a basic rule of thumb for making determinations > regarding what kinds of units to treat as language resources should be > that they should be comparable to the kinds of units treated as > resources in a traditional library catalog. For example, libraries > typically assign a single record to each book, not to each chapter > within a book. A parallel example in the OLAC context would be treating > all the objects associated with a particular field trip as a single unit > rather than treating each of the individual resources created during > that field trip as separate units. The following discussion is aimed at > assisting an OLAC participant to find the right level of description." > > It might the be nice to give lots of concrete examples, maybe you could > get some of the participating archives to do this? > > One thing I deleted from that paragraph was reference to the > recommendation given in the Repository guidelines: > "A metadata repository should not degrade the 'signal-to-noise ratio' > for language resource discovery." > > I don't find this recommendation very helpful because (for me, at least) > it is too dependent on what kinds of resources I want to discover. In > other words, "language resource discovery" is too broad an activity for > there to be one "signal-to-noise ratio". For example, if I already know > I'm looking for resources on Nahuatl, I would probably not want to find > a record saying, "There's a bunch of material on Nahuatl that's part of > some bundle over at AILLA." The signal would be too weak for me--what > I'd prefer is the search result I'd get from AILLA's catalog. Of course, > for the next person, lots of detailed records about Nahuatl would > constitute "noise". Signal and noise just don't strike me as constant > enough to form the basis of a recommendation. > > I also don't like that this recommendation privileges language resource > _discovery_ over other possible uses of the catalog. For example, > library catalogs have at least one other function in addition to > discovery: retrieval. Often I know a resource exists, but I don't know > where it is, which is why I consult the catalog (this is my primary use > of WorldCat, for example). (The word "discovery" is potentially > ambiguous enough to cover "find something previously unknown" and > "retrieve", but that's not my initial reading.) So, I would prefer a > recommendation that was more agnostic regarding the use of the metadata. > > I personally find your new discussion of provenance in the metadata > usage guidelines much more helpful than 'signal-to-noise ratio', since > it's not dependent on particular uses of OLAC service providers. So, I'd > actually recommend the following revision to the repository guidelines > regarding granularity from the present recommendation to something like: > > "A metadata repository should treat resources with a single provenance > as constituting a single unit with respect to OLAC metadata and should, > therefore, describe them within a single record." > > Another advantage to talking about granularity in terms of provenance in > my view is that the current guidelines seem to be asking data providers > to hypothesize about what search scenarios their data will be put to, > but I don't think it's reasonable to expect data providers to be very > good at this, or to even to ask them to spend time thinking about this. > That's a job for service providers. Framing the issue in terms of > provenance allows data providers to use a kind of information they are, > in principle, experts about to structure their collections, which is > presumably a good way to achieve consistency. Furthermore, it allows > service providers to be reasonably confident that they are aggregating > records of the same basic kind from different service providers. It is > thus more consonant with the overall OAI model wherein data providers > and service providers interact in terms of a well-defined series of > agreements without the one having to pay attention to the internal > activities of the other. > > Jeff -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From Gary_Simons at SIL.ORG Mon Mar 31 15:20:47 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 31 Mar 2008 10:20:47 -0500 Subject: Last call for review of new metadata documents In-Reply-To: Message-ID: Dear implementers, Today being the stated last day of the review period, this is the last call for comments for the documents on metadata usage guidelines and metrics. (The original call with the URLs is appended.) That is not to say that we won't accept comments after today. if you come through with comments in the next few days, we'll gladly receive them. However, we'll start working on the next phase of the process soon. Thanks to Jeff and Helen for stimulating discussion on the granularity issue. We have good feedback for revising the granularity section of the usage guidelines. Jeff's proposal about replacing the signal-to-nois-ratio statement with one centered on provenance is a good one. That impacts the OLAC Process standard since that is where signal-tto-noise is stated as a principle for judging new registration applications. A light revision of our standards is in the wings as part of moving from 1.0 to 1.1 of the metadata standard, so we can address that point then. I also like the suggestion of using the FAQ to bring out these explanations of with is implied by virtual library in the mission statement. Jeff raises a question that I should answer, namely, "I'm actually somewhat confused by the fact that you say OLAC is maintaining a catalog of resources. It was my understanding that OLAC is right now only maintaining one kind of "catalog", but not one of resources. Rather, it maintains a list of participating archives. The full catalogs of resources (for linguists, at least) are maintained by the two service providers: LINGUIST and the LDC." While the LDC and Linguist search engines are the visible faces of search (and still others are possible given the OAI-PMH model), an important (but not so visible) service provided centrally by OLAC is the OLAC Aggregator. OLAC runs an incremental harvest every 12 hours of all registered repositories and offers a single aggregated catalog to the world via the OAI-PMH at the following base URL (which actually generates a useful documentation page if you visit it): http://www.language-archives.org/cgi-bin/olaca3.pl This is where, for instance, the mandatory OAI_DC metadata format of the OAI-PMH is implemented. In static repositories, data providers give only OLAC metadata, but OLAC plugs them into the wider OAI-PMH world by providing the crosswalk to OAI_DC format as a value-added service in the single OLACA repository. All of the new work with metrics and quality checks is also based on the aggregated catalog. OLAC does not "maintain" an original catalog in the same way that each data provider maintains its catalog; but we are maintaining the aggreaged catalog of the virtual library by harvesting everyday to keep it up to date and doing checks to maintain quality. OLACA can also serve as the single point of contact for anyone who wants to implement a service based on OLAC metadata--the possible approaches are to idependently run the OLAC harvester (and create one's own aggregated catalog) or to simply harvest from the pre-aggregated OLACA data provider. There is another question that begs for an answer: Do we say, "OLAC is all about search!" (my simplification of something Helen said) or do we say, "OLAC aims for digital linguistic utopia starting with search!". The latter statement is closer, if you substitute "descriptive metadata" for "search". But I hasten to add that when that utopia is reached, we won't call the result OLAC, just like we don't confuse the web with the W3C. My plenary talk for last summer's workshop, "Toward the interoperability of language resources," paints a picture of such a utopia as an interoperating cyberinfrastructure for linguistics (and gives a diagram on the closing slie): http://linguistlist.org/tilr/papers/TILR%20Plenary%20Slides.pdf Of the 12 elements in the infraostructure, elements 1 through 4 (Aggregator, Metadata standard, Submission protocol, and Harvesting protocol) are specifically identified as being OLAC's contribution. Those standards are what make it possible for the other 8 elements to be built in such a way that they interoperate with each other at least at the common denominator level defined by the metadata standard. The OLAC process also provides a way for the community developing this infrastructure to define additional standards that promote community interoperation, but the vision also includes specialized subcommunities getting togetehr to define more specific standards that are specific to their areas of focus. I hope that helps clarify things. Best, -Gary Gary Simons To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Call for review of new metadata documents 03/05/2008 10:35 PM Please respond to Open Language Archives Community Implementers List Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org) From gary_simons at SIL.ORG Thu Mar 6 04:35:51 2008 From: gary_simons at SIL.ORG (Gary Simons) Date: Wed, 5 Mar 2008 23:35:51 -0500 Subject: Call for review of new metadata documents Message-ID: Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org) From jcgood at BUFFALO.EDU Thu Mar 6 15:44:40 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Thu, 6 Mar 2008 10:44:40 -0500 Subject: Call for review of new metadata documents In-Reply-To: Message-ID: Dear OLAC-Implementers, First, let me thank Gary and Steven for pulling together all these comments and making the usage guidelines revisions. It's great to see these things moving forward. It will take me some time to put together all of my comments on the revised guidelines, but I have one technical question now that I'm hoping those better informed about Dublin Core can answer. It's agreed that the isTranscriptOf and hasTranscriptOf relations are needed, but the conclusion is that we can't do anything about this in revision 1.1, but this has to be held off for revision 2. What I don't understand is why we can't use the existing model of OLAC controlled vocabulary refinements for this in the meantime. For example, why can't we use something like the second element, which looks to me to be mostly parallel to the first element, which comes out of the guidelines, using the prescribed method for encoding subject language: some-unique-identifier I guess the problem here is that "olac:code" in the second case would not be encoding a "thing" but a "relation". But is that so bad? Could we just call this "olac:predicates" to deal with this? The OLAC->DC mapping would still be straightforward (just strip out the "olac:" attributes), right? If this isn't possible for some technical reason, however, perhaps we can get a jumpstart on what will be an important feature of OLAC 2.0, by having someone draft the relevant document that will be needed to describe these refinements at some point anyway? I also wonder if, as a stopgap measure, why there can't be a recommendation about how to encode this in the meantime even if it is not officially part of the standard. For example, can't we informally agree to at least do something like this: IsTranscriptOf: some-unique-identifier At the very least, this should help people prepare for the fact that in OLAC 2.0, there will be an official way to code this. Jeff From Gary_Simons at SIL.ORG Tue Mar 25 02:16:52 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 24 Mar 2008 21:16:52 -0500 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear implementers, This is a reminder that we have one week left in the review period for the documents listed in the attached message. We are anxiously awaiting your feedback! So far we have gotten just one comment, namely, from Jeff Good asking about the possibility of using a solution like the following for isTranscriptOf and hasTranscript: Such a solution would be possible, but since isTranscriptOf is analogous to isVersionOf (and the other refinements of dc:relation), it really should be a new element (in the olac namespace) that is defined as a refinement of dc:relation, which would also enable it to take the encoding schemes that dc:relations take, e.g. This "proper" solution takes us beyond conformance to the current XML schema for qualified Dublin Core, so our thinking is that we don't want to implement a change like that, but rather wait for the revision of the XML schema for qualified DC (due out this year) that will support such extensions. We are also not keen to go to all the work of defining and implementing the olac:lingrelations extension (which includes writing a document and putting it through the stages of the review process) for a short-lived temporary solution. Thus, we have these new refinements on the list of changes for version 2.0 of our metadata format. -Gary Gary Simons To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Call for review of new metadata documents 03/05/2008 10:35 PM Please respond to Open Language Archives Community Implementers List Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org) From jcgood at BUFFALO.EDU Tue Mar 25 23:03:25 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Tue, 25 Mar 2008 19:03:25 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear Gary, Thanks for the clarification regarding the Relation element. It's too bad we're stuck waiting for DC to finish its process. Would it make sense for us to start the document process for this refinement before they officially release the new schema for qualified Dublin Core? Then we could take advantage of it quickly once it's official. I have one other question about the new documents at this point, again regarding granularity. The new discussion I think is quite welcome and sufficiently detailed and clear to be put to use. I still find that a bit of context is missing though. The background assumption seems to be that OLAC metadata is intended for certain kinds of search (I don't know of a good way to define those kinds of search other than to say they are approximately Google-like). This certainly has been an assumption driving OLAC for quite some time. The problem, as I see it, is that nowhere in the OLAC documents (that I'm aware of) is this assumption explicitly laid out. Perhaps I'm the only one who reads these things, but the Mission statement (pasted here) doesn't even explicitly talk about search at all: "OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources." Since the current granularity recommendations are only indirectly connected to the Mission, it would be nice if the relevant rationale for them were given. In fact, to be honest, I'm not sure what the rationale is precisely since I can imagine two fairly distinct ones: (i) that OLAC's mission has changed and its primary focus is to serve as a bridge between linguistic repositories and digital library initiatives like OAI (an excellent mission, if more limited than the current one) or (ii) that OLAC has determined that the most useful step it can make towards its ultimate mission at present is to facilitate language resource discovery in an OAI context. While clarifying this issue is perhaps not all that important to move forward with current work, it obviously could be pretty important down the road, in particular as search technologies and our ideas about what we want to search for and how we want to do it change. Jeff On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > Dear implementers, > > This is a reminder that we have one week left in the review period > for the > documents listed in the attached message. We are anxiously > awaiting your > feedback! > > So far we have gotten just one comment, namely, from Jeff Good > asking about > the possibility of using a solution like the following for > isTranscriptOf > and hasTranscript: > > > > Such a solution would be possible, but since isTranscriptOf is > analogous to > isVersionOf (and the other refinements of dc:relation), it really > should be > a new element (in the olac namespace) that is defined as a > refinement of > dc:relation, which would also enable it to take the encoding schemes > that > dc:relations take, e.g. > > > > This "proper" solution takes us beyond conformance to the current XML > schema for qualified Dublin Core, so our thinking is that we don't > want to > implement a change like that, but rather wait for the revision of > the XML > schema for qualified DC (due out this year) that will support such > extensions. We are also not keen to go to all the work of defining and > implementing the olac:lingrelations extension (which includes > writing a > document and putting it through the stages of the review process) > for a > short-lived temporary solution. Thus, we have these new refinements > on the > list of changes for version 2.0 of our metadata format. > > -Gary > > > > > > Gary Simons > > ORG> To > Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST > Implementers List LIST.ORG > IMPLEMENTER cc > S at LISTSERV.LINGUI > STLIST.ORG> > Subject > Call for review of new metadata > documents > 03/05/2008 10:35 > PM > > > Please respond to > Open Language > Archives > Community > Implementers List > S at LISTSERV.LINGUI > STLIST.ORG> > > > > > > > Dear implementers, > > Many of you also subscribe to the OLAC-GENERAL list and so have > gotten the > general announcement about this call for review for new metadata > documents. > Those of you who have implemented an OLAC data provider are directly > affected since this new work focuses on ways of improving the > quality of > the > metadata in our implementations. In this message we repeat the > general > announcement for the benefit of those not subscribed to OLAC- > GENERAL, and > then we supply further information that is relevant to you as > implementers. > > Six months ago the US National Science Foundation awarded funding > for a > project named "OLAC: Accessing the World's Language Resources" which > aims > to > greatly improve access to language resources for linguists and the > broader > communities of interest. If you are interested in learning more > about the > project, you may visit the project home page at: > > http://olac.wiki.sourceforge.net/ > > In the first phase of the project we are focusing on improving > metadata > quality as a prerequisite to improving the quality of search. To > that end > we have drafted some new documents that can serve as a basis for > improving > and measuring metadata quality within our community: > > Best Practice Recommendations for Language Resource Description > http://www.language-archives.org/REC/bpr.html > > OLAC Metadata Usage Guidelines > http://www.language-archives.org/NOTE/usage.html > > OLAC Metadata Quality Metrics > http://www.language-archives.org/NOTE/metrics.html > > These documents have been reviewed in Draft status by the Metadata > Working > Group. After significant revision, they are now promoted to Proposed > status > and are thus ready for review by the entire community. In keeping > with the > OLAC Process standard, we hereby make a formal call for review. The > review > period will end on MARCH 31, at which point all of the comments that > have > been received will be processed to create revised versions of the > documents. > You may submit comments by simply replying to this message. general > announcement> > > The OLAC Metadata Standard that you followed in implementing your > repository > defines the constraints on validity for a metadata record, but it > gives no > advice about what a high quality metadata record is like. The first > two > documents listed above address this issue. Then, in keeping with > the OLAC > core value of "Peer Review", we want to implement a service that will > measure conformance to the recommendations that can be automatically > tested > for. That is the issue addressed by the third document listed above. > > We have implemented the proposed Metadata Quality Score so that you > can see > the implications for your current metadata. (As the documents are > revised > to > express community consensus, the implementation of the metrics will be > updated to match.) The metadata quality analysis as currently > implemented > is > accessible from a test version of the Participating Archives page. > The site > has no links to this test page; it is accessed by entering this URL > in a > browser: > > http://www.language-archives.org/archives-new.php > > Follow the "Sample Record" link for your archive to see the quality > score > for the sample record named in your Identify response, along with > comments > on what can be done to improve the score. Follow the "Metrics" link > to see > the average quality score for the records you are currently providing. > Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), > Centre de > Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data > Repository who are already getting scores around 8 or higher. The > rest of > us have room for significant improvement! > > Eventually, this new Participating Archives page will replace the > one that > is currently accessed from the ARCHIVES link in the OLAC site banner. > However, this will not happen right away. After the current round of > review > and any subsequent revisions, the documents will be put to the OLAC > Council, > who will check the revised documents and promote them to Candidate > status > when they feel they are ready. Next we will issue a call for > implementation > and give at least one month for implementer feedback. Based on that > feedback, final revisions will be made to the satisfaction of the > Council > who will then grant Adopted status. The new Participating Archives > page > will not replace the current one until the new guidelines and > metrics are > adopted. > > This discussion of process is to let you know that you will probably > want > to > plan to update the implementation of your metadata repository some > time > within the next few months. When these new metadata recommendations > and > usage guidelines are officially adopted, the public will be able to > see the > metrics scores for your repository. In the meantime, it is just other > implementers who are seeing them. You need not wait until the > Candidate > call > for implementation to begin implementing changes. As soon as your > updated > repository is harvested, you will see the metrics change. > > Again, the review period will end on MARCH 31, at which point all of > the > comments that have been received will be processed to create revised > versions of the documents. You may submit comments by replying to > the list > (and potentially entering into discussion with other implementers) > or by > mailing them to . That account is handled by > Debbie > Chang, a Masters candidate at the Graduate Institute of Applied > Linguistics > who is the Research Assistant for our project. She will compile a > list of > all the comments (whether submitted to the list or to the project > account), > which the document editors will then be asked to respond to. That > response > will come after the review period closes. > > With a solid foundation based on quality metadata, our grant project > will > be > able to build improved search services and to expand coverage by > attracting > more participating archives and by implementing gateways to other > aggregators. We are grateful for your participation in this venture > and > trust that you share our excitement about its potential. > > Best wishes, > Gary & Steven > > _______ > Steven Bird, University of Melbourne and University of Pennsylvania > Gary Simons, SIL International and GIAL > OLAC Coordinators (www.language-archives.org) > > From jcgood at BUFFALO.EDU Tue Mar 25 23:54:16 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Tue, 25 Mar 2008 19:54:16 -0400 Subject: Specifying content of elements specifying languages In-Reply-To: Message-ID: Dear Gary (and others), I also wanted to re-raise an issue regarding the nature of the text content of elements specifying the languages of a resource (though the issue relates primarily to subject languages, not the description languages). The resolution is given described as follows, emphasis added: "There are two different issues here. The first regards clarifying the nature of the text content. The first comment points out that there are two main uses of the text content (for an alternate name or for a variety name) and asks if there should be a way to distinguish these two cases. *This can be done by means of the wording of the text content; the most straightforward approach is to add the word "dialect" after the name in the case of a variety name.* An example like this is given in the document. Other cases of indicating a variety, such as "Women's speech," don't involve a name at all and so do not pose a problem. Still other cases of using the content (like the note that Heidi Johnson gives above as an example) can include multiple names, including both alternate names and variety names." I was actually hoping for something stronger than what's given in the highlighted (with "**") sentence. It's been possible for some time to specify any number of fine-grained details in the element content. The issue that concerned me was that there seem to be a few kinds of language refinement likely to be sufficiently frequent that it may make sense to have standardized conventions for encoding them. "Dialect" is the most obvious one. Of course, if there aren't standardized codes, it would be hard for people to search for dialects, but it would be good if there was a standardized way for one to see that a resource represents some dialect. For large languages, people might care a bit about what dialect they're getting, for example. I can think of two ways to do such standardization, one easier than the other. The easy way is just to stipulate how to say a name refers to a dialect in the element content. This could be as simple as saying the name should be followed by the word "dialect" (as opposed to, say, being followed be "variety" or being preceded by "dialect: "). The second would be to add a possible refinement attribute, let's call it, olac:refinement with a controlled vocabulary consisting of, for example, "dialect" and "alternate". Thus, we would adapt this guidelines example: Saracatsan dialect To this: Saracatsan I don't know the DC restrictions well enough to know if this is appropriate. Maybe it falls under the rubric of qualified Dublin Core, in which case nothing can be easily done right now. Jeff From hdry at LINGUISTLIST.ORG Wed Mar 26 13:12:18 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Wed, 26 Mar 2008 09:12:18 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <78BA080F-24E5-4A23-BD3D-1AF352B40436@buffalo.edu> Message-ID: Hello, Gary (and all), I just wanted to second both Jeff's points. I realize that I have always assumed that OLAC metadata is designed to facilitate resource discovery, not full resource description (which might be left to the IMDI metadata set, or another more elaborated standard). I recall mentions of the fact that a researcher typically wants to find anything written on an endangered language, so just knowing the language code of some resources may be adequate. And other discussions seem to have assumed that an archive will most likely not use OCAC metadata as its primary metadata set, but rather export a subset of its descriptive and technical and administrative metadata in OLAC format. It seems to me we routinely talk of OLAC as though its primary purpose is resource discovery. This is a perfectly justifiable and reasonable mission, as Jeff notes below. It has the advantage of (a) being doable and (b) filling a niche. But I do think the mission statement should reflect it. Such clarity would be helpful to those of us who routinely try to promote OLAC. Even within the context of resource discovery, however, 'hasTranscript' would seem to be an important descriptor. In a typical linguist's collection, where nine-tenths of the recordings have not been transcribed, 'hasTranscript' would distinguish those that another researcher would most want to find. I can understand your not wanting to do a lot of work to produce a temporary solution, of course. But this is something that has been frequently requested, so maybe OLAC could put it on some 'must-do' list. And thank you for all the work you and Joan and Steven are doing. I think the OLAC Users' Guide is a very helpful and well-conceived document. All the best from snowy Michigan. -Helen Jeff Good wrote: > Dear Gary, > > Thanks for the clarification regarding the Relation element. It's too > bad we're stuck waiting for DC to finish its process. Would it make > sense for us to start the document process for this refinement before > they officially release the new schema for qualified Dublin Core? Then > we could take advantage of it quickly once it's official. > > I have one other question about the new documents at this point, again > regarding granularity. The new discussion I think is quite welcome and > sufficiently detailed and clear to be put to use. I still find that a > bit of context is missing though. The background assumption seems to be > that OLAC metadata is intended for certain kinds of search (I don't know > of a good way to define those kinds of search other than to say they are > approximately Google-like). This certainly has been an assumption > driving OLAC for quite some time. The problem, as I see it, is that > nowhere in the OLAC documents (that I'm aware of) is this assumption > explicitly laid out. > > Perhaps I'm the only one who reads these things, but the Mission > statement (pasted here) doesn't even explicitly talk about search at all: > > "OLAC, the Open Language Archives Community, is an international > partnership of institutions and individuals who are creating a worldwide > virtual library of language resources by: (i) developing consensus on > best current practice for the digital archiving of language resources, > and (ii) developing a network of interoperating repositories and > services for housing and accessing such resources." > > Since the current granularity recommendations are only indirectly > connected to the Mission, it would be nice if the relevant rationale for > them were given. In fact, to be honest, I'm not sure what the rationale > is precisely since I can imagine two fairly distinct ones: (i) that > OLAC's mission has changed and its primary focus is to serve as a bridge > between linguistic repositories and digital library initiatives like OAI > (an excellent mission, if more limited than the current one) or (ii) > that OLAC has determined that the most useful step it can make towards > its ultimate mission at present is to facilitate language resource > discovery in an OAI context. > > While clarifying this issue is perhaps not all that important to move > forward with current work, it obviously could be pretty important down > the road, in particular as search technologies and our ideas about what > we want to search for and how we want to do it change. > > Jeff > > > > > On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > >> Dear implementers, >> >> This is a reminder that we have one week left in the review period for >> the >> documents listed in the attached message. We are anxiously awaiting >> your >> feedback! >> >> So far we have gotten just one comment, namely, from Jeff Good asking >> about >> the possibility of using a solution like the following for isTranscriptOf >> and hasTranscript: >> >> >> >> Such a solution would be possible, but since isTranscriptOf is >> analogous to >> isVersionOf (and the other refinements of dc:relation), it really >> should be >> a new element (in the olac namespace) that is defined as a refinement of >> dc:relation, which would also enable it to take the encoding schemes that >> dc:relations take, e.g. >> >> >> >> This "proper" solution takes us beyond conformance to the current XML >> schema for qualified Dublin Core, so our thinking is that we don't >> want to >> implement a change like that, but rather wait for the revision of the XML >> schema for qualified DC (due out this year) that will support such >> extensions. We are also not keen to go to all the work of defining and >> implementing the olac:lingrelations extension (which includes writing a >> document and putting it through the stages of the review process) for a >> short-lived temporary solution. Thus, we have these new refinements on >> the >> list of changes for version 2.0 of our metadata format. >> >> -Gary >> >> >> >> >> >> Gary Simons >> > ORG> To >> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >> Implementers List LIST.ORG >> > S at LISTSERV.LINGUI >> STLIST.ORG> Subject >> Call for review of new metadata >> documents >> 03/05/2008 10:35 >> PM >> >> >> Please respond to >> Open Language >> Archives >> Community >> Implementers List >> > S at LISTSERV.LINGUI >> STLIST.ORG> >> >> >> >> >> >> >> Dear implementers, >> >> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >> the >> general announcement about this call for review for new metadata >> documents. >> Those of you who have implemented an OLAC data provider are directly >> affected since this new work focuses on ways of improving the quality of >> the >> metadata in our implementations. In this message we repeat the general >> announcement for the benefit of those not subscribed to OLAC-GENERAL, and >> then we supply further information that is relevant to you as >> implementers. >> >> Six months ago the US National Science Foundation awarded funding for a >> project named "OLAC: Accessing the World's Language Resources" which aims >> to >> greatly improve access to language resources for linguists and the >> broader >> communities of interest. If you are interested in learning more about the >> project, you may visit the project home page at: >> >> http://olac.wiki.sourceforge.net/ >> >> In the first phase of the project we are focusing on improving metadata >> quality as a prerequisite to improving the quality of search. To that >> end >> we have drafted some new documents that can serve as a basis for >> improving >> and measuring metadata quality within our community: >> >> Best Practice Recommendations for Language Resource Description >> http://www.language-archives.org/REC/bpr.html >> >> OLAC Metadata Usage Guidelines >> http://www.language-archives.org/NOTE/usage.html >> >> OLAC Metadata Quality Metrics >> http://www.language-archives.org/NOTE/metrics.html >> >> These documents have been reviewed in Draft status by the Metadata >> Working >> Group. After significant revision, they are now promoted to Proposed >> status >> and are thus ready for review by the entire community. In keeping with >> the >> OLAC Process standard, we hereby make a formal call for review. The >> review >> period will end on MARCH 31, at which point all of the comments that have >> been received will be processed to create revised versions of the >> documents. >> You may submit comments by simply replying to this message. > general >> announcement> >> >> The OLAC Metadata Standard that you followed in implementing your >> repository >> defines the constraints on validity for a metadata record, but it >> gives no >> advice about what a high quality metadata record is like. The first two >> documents listed above address this issue. Then, in keeping with the >> OLAC >> core value of "Peer Review", we want to implement a service that will >> measure conformance to the recommendations that can be automatically >> tested >> for. That is the issue addressed by the third document listed above. >> >> We have implemented the proposed Metadata Quality Score so that you >> can see >> the implications for your current metadata. (As the documents are revised >> to >> express community consensus, the implementation of the metrics will be >> updated to match.) The metadata quality analysis as currently implemented >> is >> accessible from a test version of the Participating Archives page. The >> site >> has no links to this test page; it is accessed by entering this URL in a >> browser: >> >> http://www.language-archives.org/archives-new.php >> >> Follow the "Sample Record" link for your archive to see the quality score >> for the sample record named in your Identify response, along with >> comments >> on what can be done to improve the score. Follow the "Metrics" link to >> see >> the average quality score for the records you are currently providing. >> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >> Repository who are already getting scores around 8 or higher. The >> rest of >> us have room for significant improvement! >> >> Eventually, this new Participating Archives page will replace the one >> that >> is currently accessed from the ARCHIVES link in the OLAC site banner. >> However, this will not happen right away. After the current round of >> review >> and any subsequent revisions, the documents will be put to the OLAC >> Council, >> who will check the revised documents and promote them to Candidate status >> when they feel they are ready. Next we will issue a call for >> implementation >> and give at least one month for implementer feedback. Based on that >> feedback, final revisions will be made to the satisfaction of the Council >> who will then grant Adopted status. The new Participating Archives page >> will not replace the current one until the new guidelines and metrics are >> adopted. >> >> This discussion of process is to let you know that you will probably want >> to >> plan to update the implementation of your metadata repository some time >> within the next few months. When these new metadata recommendations and >> usage guidelines are officially adopted, the public will be able to >> see the >> metrics scores for your repository. In the meantime, it is just other >> implementers who are seeing them. You need not wait until the Candidate >> call >> for implementation to begin implementing changes. As soon as your >> updated >> repository is harvested, you will see the metrics change. >> >> Again, the review period will end on MARCH 31, at which point all of the >> comments that have been received will be processed to create revised >> versions of the documents. You may submit comments by replying to the >> list >> (and potentially entering into discussion with other implementers) or by >> mailing them to . That account is handled by >> Debbie >> Chang, a Masters candidate at the Graduate Institute of Applied >> Linguistics >> who is the Research Assistant for our project. She will compile a >> list of >> all the comments (whether submitted to the list or to the project >> account), >> which the document editors will then be asked to respond to. That >> response >> will come after the review period closes. >> >> With a solid foundation based on quality metadata, our grant project will >> be >> able to build improved search services and to expand coverage by >> attracting >> more participating archives and by implementing gateways to other >> aggregators. We are grateful for your participation in this venture and >> trust that you share our excitement about its potential. >> >> Best wishes, >> Gary & Steven >> >> _______ >> Steven Bird, University of Melbourne and University of Pennsylvania >> Gary Simons, SIL International and GIAL >> OLAC Coordinators (www.language-archives.org) >> >> -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From Gary_Simons at SIL.ORG Fri Mar 28 04:15:51 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Thu, 27 Mar 2008 23:15:51 -0500 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <47EA4BB2.5030201@linguistlist.org> Message-ID: Jeff and Helen, I do think the mission statement speaks to the issue you are asking about, but it is clearly implicit and wrapped up in what I hope is a shared understanding of the term "library". As the mission statement says, the purpose of OLAC is to "create a virtual library of language resources." A simplistic model of what a library does is that it: (1) builds a collection of resources, (2) curates that collection over the long term, and (3) maintains a catalog that helps its users find the resources that are relevant to them. Since OLAC is a virtual library it doesn't need to do point (2) of curating a collection--each of our participating archives is doing that for their piece of the virtual collection. But OLAC is (1) building the virtual collection by recruiting more archives to participate and (as our current grant project unfolds) developing gateways to other aggregated catalogs, and (3) maintaining a catalog to help users find resources. The OLAC metadata standard is, of course, the specification for how to create an entry for the catalog. The discovery goals that the granularity guidelines reflect are based on what we would typically expect from a library catalog. When it comes to books, for instance, the library catalog helps us find a book that we can judge to be potentially relevant based on title and author and subject and the like, but it does not give us the detailed table of contents. We have to open the book to find that. Similarly, the catalog for a library or archive typically treats something like a collection of field notes and recordings (that have the same provenance) as a single item (which is why we have DCMI type Collection). If the catalog record sounds relevant, then we have to open the collection to find the detailed table of contents. In library cataloging practice, it is the collection that is analogous to a book, rather than an individual recorded session. Thus I think this interpretation of desired granularity is straightforwardly implied by the OLAC mission of creating a virtual library. If you have any ideas of specific wording changes in the granularity guidelines that might help to clarify this, I'll be glad to hear them. The current growth edge of work in the OAI is on developing a standard for describing the detailed contents of a compound object in an interoperable way. It is called OAI-ORE (for Object Reuse and Exchange), currently released in an alpha version: http://www.openarchives.org/ore/ It does not replace the OAI_DC description or change the basic catalog. Rather, when available, it is a second description of a resource that identifies all of its components and how they function and relate to each other. It can be used to implement services that make more intelligent use of resources. Once we have done a conversion to the new style of qualified DC description and once the OAI-ORE spec is established, we may well want to work on OLAC guidelines for applying OAI-ORE so that we can intelligently handle compound objects. Best, -Gary Helen Aristar-Dry To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Re: Reminder: Call for review of new metadata documents 03/26/2008 08:12 AM Please respond to Open Language Archives Community Implementers List Hello, Gary (and all), I just wanted to second both Jeff's points. I realize that I have always assumed that OLAC metadata is designed to facilitate resource discovery, not full resource description (which might be left to the IMDI metadata set, or another more elaborated standard). I recall mentions of the fact that a researcher typically wants to find anything written on an endangered language, so just knowing the language code of some resources may be adequate. And other discussions seem to have assumed that an archive will most likely not use OCAC metadata as its primary metadata set, but rather export a subset of its descriptive and technical and administrative metadata in OLAC format. It seems to me we routinely talk of OLAC as though its primary purpose is resource discovery. This is a perfectly justifiable and reasonable mission, as Jeff notes below. It has the advantage of (a) being doable and (b) filling a niche. But I do think the mission statement should reflect it. Such clarity would be helpful to those of us who routinely try to promote OLAC. Even within the context of resource discovery, however, 'hasTranscript' would seem to be an important descriptor. In a typical linguist's collection, where nine-tenths of the recordings have not been transcribed, 'hasTranscript' would distinguish those that another researcher would most want to find. I can understand your not wanting to do a lot of work to produce a temporary solution, of course. But this is something that has been frequently requested, so maybe OLAC could put it on some 'must-do' list. And thank you for all the work you and Joan and Steven are doing. I think the OLAC Users' Guide is a very helpful and well-conceived document. All the best from snowy Michigan. -Helen Jeff Good wrote: > Dear Gary, > > Thanks for the clarification regarding the Relation element. It's too > bad we're stuck waiting for DC to finish its process. Would it make > sense for us to start the document process for this refinement before > they officially release the new schema for qualified Dublin Core? Then > we could take advantage of it quickly once it's official. > > I have one other question about the new documents at this point, again > regarding granularity. The new discussion I think is quite welcome and > sufficiently detailed and clear to be put to use. I still find that a > bit of context is missing though. The background assumption seems to be > that OLAC metadata is intended for certain kinds of search (I don't know > of a good way to define those kinds of search other than to say they are > approximately Google-like). This certainly has been an assumption > driving OLAC for quite some time. The problem, as I see it, is that > nowhere in the OLAC documents (that I'm aware of) is this assumption > explicitly laid out. > > Perhaps I'm the only one who reads these things, but the Mission > statement (pasted here) doesn't even explicitly talk about search at all: > > "OLAC, the Open Language Archives Community, is an international > partnership of institutions and individuals who are creating a worldwide > virtual library of language resources by: (i) developing consensus on > best current practice for the digital archiving of language resources, > and (ii) developing a network of interoperating repositories and > services for housing and accessing such resources." > > Since the current granularity recommendations are only indirectly > connected to the Mission, it would be nice if the relevant rationale for > them were given. In fact, to be honest, I'm not sure what the rationale > is precisely since I can imagine two fairly distinct ones: (i) that > OLAC's mission has changed and its primary focus is to serve as a bridge > between linguistic repositories and digital library initiatives like OAI > (an excellent mission, if more limited than the current one) or (ii) > that OLAC has determined that the most useful step it can make towards > its ultimate mission at present is to facilitate language resource > discovery in an OAI context. > > While clarifying this issue is perhaps not all that important to move > forward with current work, it obviously could be pretty important down > the road, in particular as search technologies and our ideas about what > we want to search for and how we want to do it change. > > Jeff > > > > > On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: > >> Dear implementers, >> >> This is a reminder that we have one week left in the review period for >> the >> documents listed in the attached message. We are anxiously awaiting >> your >> feedback! >> >> So far we have gotten just one comment, namely, from Jeff Good asking >> about >> the possibility of using a solution like the following for isTranscriptOf >> and hasTranscript: >> >> >> >> Such a solution would be possible, but since isTranscriptOf is >> analogous to >> isVersionOf (and the other refinements of dc:relation), it really >> should be >> a new element (in the olac namespace) that is defined as a refinement of >> dc:relation, which would also enable it to take the encoding schemes that >> dc:relations take, e.g. >> >> >> >> This "proper" solution takes us beyond conformance to the current XML >> schema for qualified Dublin Core, so our thinking is that we don't >> want to >> implement a change like that, but rather wait for the revision of the XML >> schema for qualified DC (due out this year) that will support such >> extensions. We are also not keen to go to all the work of defining and >> implementing the olac:lingrelations extension (which includes writing a >> document and putting it through the stages of the review process) for a >> short-lived temporary solution. Thus, we have these new refinements on >> the >> list of changes for version 2.0 of our metadata format. >> >> -Gary >> >> >> >> >> >> Gary Simons >> > ORG> To >> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >> Implementers List LIST.ORG >> > S at LISTSERV.LINGUI >> STLIST.ORG> Subject >> Call for review of new metadata >> documents >> 03/05/2008 10:35 >> PM >> >> >> Please respond to >> Open Language >> Archives >> Community >> Implementers List >> > S at LISTSERV.LINGUI >> STLIST.ORG> >> >> >> >> >> >> >> Dear implementers, >> >> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >> the >> general announcement about this call for review for new metadata >> documents. >> Those of you who have implemented an OLAC data provider are directly >> affected since this new work focuses on ways of improving the quality of >> the >> metadata in our implementations. In this message we repeat the general >> announcement for the benefit of those not subscribed to OLAC-GENERAL, and >> then we supply further information that is relevant to you as >> implementers. >> >> Six months ago the US National Science Foundation awarded funding for a >> project named "OLAC: Accessing the World's Language Resources" which aims >> to >> greatly improve access to language resources for linguists and the >> broader >> communities of interest. If you are interested in learning more about the >> project, you may visit the project home page at: >> >> http://olac.wiki.sourceforge.net/ >> >> In the first phase of the project we are focusing on improving metadata >> quality as a prerequisite to improving the quality of search. To that >> end >> we have drafted some new documents that can serve as a basis for >> improving >> and measuring metadata quality within our community: >> >> Best Practice Recommendations for Language Resource Description >> http://www.language-archives.org/REC/bpr.html >> >> OLAC Metadata Usage Guidelines >> http://www.language-archives.org/NOTE/usage.html >> >> OLAC Metadata Quality Metrics >> http://www.language-archives.org/NOTE/metrics.html >> >> These documents have been reviewed in Draft status by the Metadata >> Working >> Group. After significant revision, they are now promoted to Proposed >> status >> and are thus ready for review by the entire community. In keeping with >> the >> OLAC Process standard, we hereby make a formal call for review. The >> review >> period will end on MARCH 31, at which point all of the comments that have >> been received will be processed to create revised versions of the >> documents. >> You may submit comments by simply replying to this message. > general >> announcement> >> >> The OLAC Metadata Standard that you followed in implementing your >> repository >> defines the constraints on validity for a metadata record, but it >> gives no >> advice about what a high quality metadata record is like. The first two >> documents listed above address this issue. Then, in keeping with the >> OLAC >> core value of "Peer Review", we want to implement a service that will >> measure conformance to the recommendations that can be automatically >> tested >> for. That is the issue addressed by the third document listed above. >> >> We have implemented the proposed Metadata Quality Score so that you >> can see >> the implications for your current metadata. (As the documents are revised >> to >> express community consensus, the implementation of the metrics will be >> updated to match.) The metadata quality analysis as currently implemented >> is >> accessible from a test version of the Participating Archives page. The >> site >> has no links to this test page; it is accessed by entering this URL in a >> browser: >> >> http://www.language-archives.org/archives-new.php >> >> Follow the "Sample Record" link for your archive to see the quality score >> for the sample record named in your Identify response, along with >> comments >> on what can be done to improve the score. Follow the "Metrics" link to >> see >> the average quality score for the records you are currently providing. >> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >> Repository who are already getting scores around 8 or higher. The >> rest of >> us have room for significant improvement! >> >> Eventually, this new Participating Archives page will replace the one >> that >> is currently accessed from the ARCHIVES link in the OLAC site banner. >> However, this will not happen right away. After the current round of >> review >> and any subsequent revisions, the documents will be put to the OLAC >> Council, >> who will check the revised documents and promote them to Candidate status >> when they feel they are ready. Next we will issue a call for >> implementation >> and give at least one month for implementer feedback. Based on that >> feedback, final revisions will be made to the satisfaction of the Council >> who will then grant Adopted status. The new Participating Archives page >> will not replace the current one until the new guidelines and metrics are >> adopted. >> >> This discussion of process is to let you know that you will probably want >> to >> plan to update the implementation of your metadata repository some time >> within the next few months. When these new metadata recommendations and >> usage guidelines are officially adopted, the public will be able to >> see the >> metrics scores for your repository. In the meantime, it is just other >> implementers who are seeing them. You need not wait until the Candidate >> call >> for implementation to begin implementing changes. As soon as your >> updated >> repository is harvested, you will see the metrics change. >> >> Again, the review period will end on MARCH 31, at which point all of the >> comments that have been received will be processed to create revised >> versions of the documents. You may submit comments by replying to the >> list >> (and potentially entering into discussion with other implementers) or by >> mailing them to . That account is handled by >> Debbie >> Chang, a Masters candidate at the Graduate Institute of Applied >> Linguistics >> who is the Research Assistant for our project. She will compile a >> list of >> all the comments (whether submitted to the list or to the project >> account), >> which the document editors will then be asked to respond to. That >> response >> will come after the review period closes. >> >> With a solid foundation based on quality metadata, our grant project will >> be >> able to build improved search services and to expand coverage by >> attracting >> more participating archives and by implementing gateways to other >> aggregators. We are grateful for your participation in this venture and >> trust that you share our excitement about its potential. >> >> Best wishes, >> Gary & Steven >> >> _______ >> Steven Bird, University of Melbourne and University of Pennsylvania >> Gary Simons, SIL International and GIAL >> OLAC Coordinators (www.language-archives.org) >> >> -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From hdry at LINGUISTLIST.ORG Fri Mar 28 14:19:12 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Fri, 28 Mar 2008 10:19:12 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: That's a great explanation, Gary; and I'll buy everything in it except that it is "straightforwardly implied"! (Great oxymoronic phrase!) Seriously, I will buy this reasoning when I read it, as below; and, as you know, I think that resource discovery IS the right focus for OLAC. But I don't think that everyone gets that from the mission statement. Why don't you just add part of the explanation below to the mission statement and clarify it for everyone. It could read "... create a virtual library of language resources through building a collection and maintaining a resource catalog. OLAC is (1) building a virtual collection through archive recruitment and the development of gateways to other aggregated catalogs and (2) maintaining an online metadata catalog to aid in resource discovery. The OLAC metadata standard is the specification for creating an entry for the catalog." But maybe Jeff will have a better idea. Thanks, -Helen Gary Simons wrote: > Jeff and Helen, > > I do think the mission statement speaks to the issue you are asking about, > but it is clearly implicit and wrapped up in what I hope is a shared > understanding of the term "library". As the mission statement says, the > purpose of OLAC is to "" > > A simplistic model of what a library does is that it: (1) builds a > collection of resources, (2) curates that collection over the long term, > and (3) maintains a catalog that helps its users find the resources that > are relevant to them. Since OLAC is a virtual library it doesn't need to > do point (2) of curating a collection--each of our participating archives > is doing that for their piece of the virtual collection. But OLAC is (1) > building the virtual collection by recruiting more archives to participate > and (as our current grant project unfolds) developing gateways to other > aggregated catalogs, and (3) maintaining a catalog to help users find > resources. The OLAC metadata standard is, of course, the specification for > how to create an entry for the catalog. > > The discovery goals that the granularity guidelines reflect are based on > what we would typically expect from a library catalog. When it comes to > books, for instance, the library catalog helps us find a book that we can > judge to be potentially relevant based on title and author and subject and > the like, but it does not give us the detailed table of contents. We have > to open the book to find that. Similarly, the catalog for a library or > archive typically treats something like a collection of field notes and > recordings (that have the same provenance) as a single item (which is why > we have DCMI type Collection). If the catalog record sounds relevant, then > we have to open the collection to find the detailed table of contents. In > library cataloging practice, it is the collection that is analogous to a > book, rather than an individual recorded session. Thus I think this > interpretation of desired granularity is straightforwardly implied by the > OLAC mission of creating a virtual library. If you have any ideas of > specific wording changes in the granularity guidelines that might help to > clarify this, I'll be glad to hear them. > > The current growth edge of work in the OAI is on developing a standard for > describing the detailed contents of a compound object in an interoperable > way. It is called OAI-ORE (for Object Reuse and Exchange), currently > released in an alpha version: > > http://www.openarchives.org/ore/ > > It does not replace the OAI_DC description or change the basic catalog. > Rather, when available, it is a second description of a resource that > identifies all of its components and how they function and relate to each > other. It can be used to implement services that make more intelligent use > of resources. Once we have done a conversion to the new style of qualified > DC description and once the OAI-ORE spec is established, we may well want > to work on OLAC guidelines for applying OAI-ORE so that we can > intelligently handle compound objects. > > Best, > -Gary > > > > > > > Helen Aristar-Dry > T.ORG> To > Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST > Implementers List LIST.ORG > S at LISTSERV.LINGUI > STLIST.ORG> Subject > Re: Reminder: Call for review of > new metadata documents > 03/26/2008 08:12 > AM > > > Please respond to > Open Language > Archives > Community > Implementers List > S at LISTSERV.LINGUI > STLIST.ORG> > > > > > > > Hello, Gary (and all), > > I just wanted to second both Jeff's points. I realize that I have > always assumed that OLAC metadata is designed to facilitate resource > discovery, not full resource description (which might be left to the > IMDI metadata set, or another more elaborated standard). I recall > mentions of the fact that a researcher typically wants to find anything > written on an endangered language, so just knowing the language code of > some resources may be adequate. And other discussions seem to have > assumed that an archive will most likely not use OCAC metadata as its > primary metadata set, but rather export a subset of its descriptive and > technical and administrative metadata in OLAC format. It seems to me > we routinely talk of OLAC as though its primary purpose is resource > discovery. This is a perfectly justifiable and reasonable mission, as > Jeff notes below. It has the advantage of (a) being doable and (b) > filling a niche. But I do think the mission statement should reflect > it. Such clarity would be helpful to those of us who routinely try to > promote OLAC. > > Even within the context of resource discovery, however, 'hasTranscript' > would seem to be an important descriptor. In a typical linguist's > collection, where nine-tenths of the recordings have not been > transcribed, 'hasTranscript' would distinguish those that another > researcher would most want to find. I can understand your not wanting > to do a lot of work to produce a temporary solution, of course. But > this is something that has been frequently requested, so maybe OLAC > could put it on some 'must-do' list. > > And thank you for all the work you and Joan and Steven are doing. I > think the OLAC Users' Guide is a very helpful and well-conceived document. > > All the best from snowy Michigan. > -Helen > > Jeff Good wrote: >> Dear Gary, >> >> Thanks for the clarification regarding the Relation element. It's too >> bad we're stuck waiting for DC to finish its process. Would it make >> sense for us to start the document process for this refinement before >> they officially release the new schema for qualified Dublin Core? Then >> we could take advantage of it quickly once it's official. >> >> I have one other question about the new documents at this point, again >> regarding granularity. The new discussion I think is quite welcome and >> sufficiently detailed and clear to be put to use. I still find that a >> bit of context is missing though. The background assumption seems to be >> that OLAC metadata is intended for certain kinds of search (I don't know >> of a good way to define those kinds of search other than to say they are >> approximately Google-like). This certainly has been an assumption >> driving OLAC for quite some time. The problem, as I see it, is that >> nowhere in the OLAC documents (that I'm aware of) is this assumption >> explicitly laid out. >> >> Perhaps I'm the only one who reads these things, but the Mission >> statement (pasted here) doesn't even explicitly talk about search at all: >> >> "OLAC, the Open Language Archives Community, is an international >> partnership of institutions and individuals who are creating a worldwide >> virtual library of language resources by: (i) developing consensus on >> best current practice for the digital archiving of language resources, >> and (ii) developing a network of interoperating repositories and >> services for housing and accessing such resources." >> >> Since the current granularity recommendations are only indirectly >> connected to the Mission, it would be nice if the relevant rationale for >> them were given. In fact, to be honest, I'm not sure what the rationale >> is precisely since I can imagine two fairly distinct ones: (i) that >> OLAC's mission has changed and its primary focus is to serve as a bridge >> between linguistic repositories and digital library initiatives like OAI >> (an excellent mission, if more limited than the current one) or (ii) >> that OLAC has determined that the most useful step it can make towards >> its ultimate mission at present is to facilitate language resource >> discovery in an OAI context. >> >> While clarifying this issue is perhaps not all that important to move >> forward with current work, it obviously could be pretty important down >> the road, in particular as search technologies and our ideas about what >> we want to search for and how we want to do it change. >> >> Jeff >> >> >> >> >> On Mar 24, 2008, at 10:16 PM, Gary Simons wrote: >> >>> Dear implementers, >>> >>> This is a reminder that we have one week left in the review period for >>> the >>> documents listed in the attached message. We are anxiously awaiting >>> your >>> feedback! >>> >>> So far we have gotten just one comment, namely, from Jeff Good asking >>> about >>> the possibility of using a solution like the following for > isTranscriptOf >>> and hasTranscript: >>> >>> >>> >>> Such a solution would be possible, but since isTranscriptOf is >>> analogous to >>> isVersionOf (and the other refinements of dc:relation), it really >>> should be >>> a new element (in the olac namespace) that is defined as a refinement of >>> dc:relation, which would also enable it to take the encoding schemes > that >>> dc:relations take, e.g. >>> >>> >>> >>> This "proper" solution takes us beyond conformance to the current XML >>> schema for qualified Dublin Core, so our thinking is that we don't >>> want to >>> implement a change like that, but rather wait for the revision of the > XML >>> schema for qualified DC (due out this year) that will support such >>> extensions. We are also not keen to go to all the work of defining and >>> implementing the olac:lingrelations extension (which includes writing a >>> document and putting it through the stages of the review process) for a >>> short-lived temporary solution. Thus, we have these new refinements on >>> the >>> list of changes for version 2.0 of our metadata format. >>> >>> -Gary >>> >>> >>> >>> >>> >>> Gary Simons >>> >> ORG> To >>> Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST >>> Implementers List LIST.ORG >>> >> S at LISTSERV.LINGUI >>> STLIST.ORG> Subject >>> Call for review of new metadata >>> documents >>> 03/05/2008 10:35 >>> PM >>> >>> >>> Please respond to >>> Open Language >>> Archives >>> Community >>> Implementers List >>> >> S at LISTSERV.LINGUI >>> STLIST.ORG> >>> >>> >>> >>> >>> >>> >>> Dear implementers, >>> >>> Many of you also subscribe to the OLAC-GENERAL list and so have gotten >>> the >>> general announcement about this call for review for new metadata >>> documents. >>> Those of you who have implemented an OLAC data provider are directly >>> affected since this new work focuses on ways of improving the quality of >>> the >>> metadata in our implementations. In this message we repeat the general >>> announcement for the benefit of those not subscribed to OLAC-GENERAL, > and >>> then we supply further information that is relevant to you as >>> implementers. >>> >>> Six months ago the US National Science Foundation awarded funding for a >>> project named "OLAC: Accessing the World's Language Resources" which > aims >>> to >>> greatly improve access to language resources for linguists and the >>> broader >>> communities of interest. If you are interested in learning more about > the >>> project, you may visit the project home page at: >>> >>> http://olac.wiki.sourceforge.net/ >>> >>> In the first phase of the project we are focusing on improving metadata >>> quality as a prerequisite to improving the quality of search. To that >>> end >>> we have drafted some new documents that can serve as a basis for >>> improving >>> and measuring metadata quality within our community: >>> >>> Best Practice Recommendations for Language Resource Description >>> http://www.language-archives.org/REC/bpr.html >>> >>> OLAC Metadata Usage Guidelines >>> http://www.language-archives.org/NOTE/usage.html >>> >>> OLAC Metadata Quality Metrics >>> http://www.language-archives.org/NOTE/metrics.html >>> >>> These documents have been reviewed in Draft status by the Metadata >>> Working >>> Group. After significant revision, they are now promoted to Proposed >>> status >>> and are thus ready for review by the entire community. In keeping with >>> the >>> OLAC Process standard, we hereby make a formal call for review. The >>> review >>> period will end on MARCH 31, at which point all of the comments that > have >>> been received will be processed to create revised versions of the >>> documents. >>> You may submit comments by simply replying to this message. >> general >>> announcement> >>> >>> The OLAC Metadata Standard that you followed in implementing your >>> repository >>> defines the constraints on validity for a metadata record, but it >>> gives no >>> advice about what a high quality metadata record is like. The first two >>> documents listed above address this issue. Then, in keeping with the >>> OLAC >>> core value of "Peer Review", we want to implement a service that will >>> measure conformance to the recommendations that can be automatically >>> tested >>> for. That is the issue addressed by the third document listed above. >>> >>> We have implemented the proposed Metadata Quality Score so that you >>> can see >>> the implications for your current metadata. (As the documents are > revised >>> to >>> express community consensus, the implementation of the metrics will be >>> updated to match.) The metadata quality analysis as currently > implemented >>> is >>> accessible from a test version of the Participating Archives page. The >>> site >>> has no links to this test page; it is accessed by entering this URL in a >>> browser: >>> >>> http://www.language-archives.org/archives-new.php >>> >>> Follow the "Sample Record" link for your archive to see the quality > score >>> for the sample record named in your Identify response, along with >>> comments >>> on what can be done to improve the score. Follow the "Metrics" link to >>> see >>> the average quality score for the records you are currently providing. >>> Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de >>> Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data >>> Repository who are already getting scores around 8 or higher. The >>> rest of >>> us have room for significant improvement! >>> >>> Eventually, this new Participating Archives page will replace the one >>> that >>> is currently accessed from the ARCHIVES link in the OLAC site banner. >>> However, this will not happen right away. After the current round of >>> review >>> and any subsequent revisions, the documents will be put to the OLAC >>> Council, >>> who will check the revised documents and promote them to Candidate > status >>> when they feel they are ready. Next we will issue a call for >>> implementation >>> and give at least one month for implementer feedback. Based on that >>> feedback, final revisions will be made to the satisfaction of the > Council >>> who will then grant Adopted status. The new Participating Archives page >>> will not replace the current one until the new guidelines and metrics > are >>> adopted. >>> >>> This discussion of process is to let you know that you will probably > want >>> to >>> plan to update the implementation of your metadata repository some time >>> within the next few months. When these new metadata recommendations and >>> usage guidelines are officially adopted, the public will be able to >>> see the >>> metrics scores for your repository. In the meantime, it is just other >>> implementers who are seeing them. You need not wait until the Candidate >>> call >>> for implementation to begin implementing changes. As soon as your >>> updated >>> repository is harvested, you will see the metrics change. >>> >>> Again, the review period will end on MARCH 31, at which point all of the >>> comments that have been received will be processed to create revised >>> versions of the documents. You may submit comments by replying to the >>> list >>> (and potentially entering into discussion with other implementers) or by >>> mailing them to . That account is handled by >>> Debbie >>> Chang, a Masters candidate at the Graduate Institute of Applied >>> Linguistics >>> who is the Research Assistant for our project. She will compile a >>> list of >>> all the comments (whether submitted to the list or to the project >>> account), >>> which the document editors will then be asked to respond to. That >>> response >>> will come after the review period closes. >>> >>> With a solid foundation based on quality metadata, our grant project > will >>> be >>> able to build improved search services and to expand coverage by >>> attracting >>> more participating archives and by implementing gateways to other >>> aggregators. We are grateful for your participation in this venture and >>> trust that you share our excitement about its potential. >>> >>> Best wishes, >>> Gary & Steven >>> >>> _______ >>> Steven Bird, University of Melbourne and University of Pennsylvania >>> Gary Simons, SIL International and GIAL >>> OLAC Coordinators (www.language-archives.org) >>> >>> > > -- > Helen Aristar-Dry > Professor of Linguistics > Director, Institute for Language Information and Technology (ILIT) > Eastern Michigan University > 2000 Huron River Rd., Suite 104 > Ypsilanti, MI 48197 > > 734.487.0144 (ILIT office) > 734.487.7952 (faculty office) > 734.482.0132 (fax) > hdry at linguistlist.org -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From jcgood at BUFFALO.EDU Sat Mar 29 20:15:30 2008 From: jcgood at BUFFALO.EDU (Jeff Good) Date: Sat, 29 Mar 2008 16:15:30 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: Message-ID: Dear Gary (and others), Thanks a lot for that clarification. The relationship between the mission statement and the granularity guidelines is much clearer to me now. I agree with Helen that the reasoning should be made explicit somewhere more prominent than this list. Your interpretation is not how I have interpreted the mission statement largely because I missed out on the significance of the word "library". I also find "virtual library" to be ambiguous between what one might call a "digital library" and what one might call an "aggregated library" (the latter sense being my label for your understanding of the OLAC use). I think it might be worth adding two questions to the FAQ (or answering these questions in some other appropriate place): (i) What does OLAC mean by "virtual library"? and (ii) What does OLAC mean by "language archive"? That should help a lot with possible ambiguities in the mission statement. An important open issue, which is still not clear to me from your explanation is whether OLAC is focusing on a "card catalog" now because that's all OLAC ever sees itself doing or if it, instead, views getting the card catalog part right as the first step towards a deeper kind of interoperability. (My reading of the mission would be that the latter interpretation is correct, but I already missed out on the importance of "library" in the mission. So, I'm probably missing several other points. I think the crucial point in this regard is understanding what the level of interoperability one hopes to achieve with respect to the "interoperating repositories".) I don't think this is a merely pedantic issue right now because it matters a lot for how we "advertise" OLAC. Do we say, "OLAC is all about search!" (my simplification of something Helen said) or do we say, "OLAC aims for digital linguistic utopia starting with search!". (For what it's worth, I don't really care strongly about which path OLAC takes, but I would like to be confident I'm describing OLAC's goals correctly to other people.) > aggregated catalogs, and (3) maintaining a catalog to help users find > resources. The OLAC metadata standard is, of course, the > specification for > how to create an entry for the catalog. I'm actually somewhat confused by the fact that you say OLAC is maintaining a catalog of resources. It was my understanding that OLAC is right now only maintaining one kind of "catalog", but not one of resources. Rather, it maintains a list of participating archives. The full catalogs of resources (for linguists, at least) are maintained by the two service providers: LINGUIST and the LDC. (I know there are lots of connections between OLAC and these catalogs, but, strictly speaking, I didn't think OLAC was in the catalog maintenance business but, rather, defined a way through which a catalog could be maintained by outside parties.) > book, rather than an individual recorded session. Thus I think this > interpretation of desired granularity is straightforwardly implied > by the > OLAC mission of creating a virtual library. If you have any ideas of > specific wording changes in the granularity guidelines that might > help to > clarify this, I'll be glad to hear them. I think your response already has all of the required points. Helen seemed to suggest adding an explanation to the mission statement. I'll let you and Steven decide if that's appropriate. (I'm not sure what the process is for adding explanations to the mission statement.) With respect to the guidelines, I recommend changing the first paragraph of the granularity discussion to something like the following (based on my understanding of your explanation): "Determining the right level for units to be described as language resources in the OLAC context involves multiple factors. The level of unit appropriate for inclusion in an aggregated catalog like OLAC's may be different (typically higher) than the level desirable for the catalog of a specific institution's holdings, which in turn is typically higher than the level desirable for describing the detailed contents of a resource. Consistent with its mission to create a virtual _library_ of language resources, a basic rule of thumb for making determinations regarding what kinds of units to treat as language resources should be that they should be comparable to the kinds of units treated as resources in a traditional library catalog. For example, libraries typically assign a single record to each book, not to each chapter within a book. A parallel example in the OLAC context would be treating all the objects associated with a particular field trip as a single unit rather than treating each of the individual resources created during that field trip as separate units. The following discussion is aimed at assisting an OLAC participant to find the right level of description." It might the be nice to give lots of concrete examples, maybe you could get some of the participating archives to do this? One thing I deleted from that paragraph was reference to the recommendation given in the Repository guidelines: "A metadata repository should not degrade the 'signal-to-noise ratio' for language resource discovery." I don't find this recommendation very helpful because (for me, at least) it is too dependent on what kinds of resources I want to discover. In other words, "language resource discovery" is too broad an activity for there to be one "signal-to-noise ratio". For example, if I already know I'm looking for resources on Nahuatl, I would probably not want to find a record saying, "There's a bunch of material on Nahuatl that's part of some bundle over at AILLA." The signal would be too weak for me--what I'd prefer is the search result I'd get from AILLA's catalog. Of course, for the next person, lots of detailed records about Nahuatl would constitute "noise". Signal and noise just don't strike me as constant enough to form the basis of a recommendation. I also don't like that this recommendation privileges language resource _discovery_ over other possible uses of the catalog. For example, library catalogs have at least one other function in addition to discovery: retrieval. Often I know a resource exists, but I don't know where it is, which is why I consult the catalog (this is my primary use of WorldCat, for example). (The word "discovery" is potentially ambiguous enough to cover "find something previously unknown" and "retrieve", but that's not my initial reading.) So, I would prefer a recommendation that was more agnostic regarding the use of the metadata. I personally find your new discussion of provenance in the metadata usage guidelines much more helpful than 'signal-to-noise ratio', since it's not dependent on particular uses of OLAC service providers. So, I'd actually recommend the following revision to the repository guidelines regarding granularity from the present recommendation to something like: "A metadata repository should treat resources with a single provenance as constituting a single unit with respect to OLAC metadata and should, therefore, describe them within a single record." Another advantage to talking about granularity in terms of provenance in my view is that the current guidelines seem to be asking data providers to hypothesize about what search scenarios their data will be put to, but I don't think it's reasonable to expect data providers to be very good at this, or to even to ask them to spend time thinking about this. That's a job for service providers. Framing the issue in terms of provenance allows data providers to use a kind of information they are, in principle, experts about to structure their collections, which is presumably a good way to achieve consistency. Furthermore, it allows service providers to be reasonably confident that they are aggregating records of the same basic kind from different service providers. It is thus more consonant with the overall OAI model wherein data providers and service providers interact in terms of a well-defined series of agreements without the one having to pay attention to the internal activities of the other. Jeff From hdry at LINGUISTLIST.ORG Sun Mar 30 14:20:53 2008 From: hdry at LINGUISTLIST.ORG (Helen Aristar-Dry) Date: Sun, 30 Mar 2008 10:20:53 -0400 Subject: Reminder: Call for review of new metadata documents In-Reply-To: <08921381-95C5-4081-A5BD-7430FF929B48@buffalo.edu> Message-ID: Extremely sensible remarks, Jeff. I agree especially with the points about 'signal to noise' ratio and think that Gary's remarks on provenance or your revision, which gives an example, would be much more helpful. -Helen Jeff Good wrote: > Dear Gary (and others), > > Thanks a lot for that clarification. The relationship between the > mission statement and the granularity guidelines is much clearer to me > now. I agree with Helen that the reasoning should be made explicit > somewhere more prominent than this list. Your interpretation is not how > I have interpreted the mission statement largely because I missed out on > the significance of the word "library". I also find "virtual library" to > be ambiguous between what one might call a "digital library" and what > one might call an "aggregated library" (the latter sense being my label > for your understanding of the OLAC use). > > I think it might be worth adding two questions to the FAQ (or answering > these questions in some other appropriate place): (i) What does OLAC > mean by "virtual library"? and (ii) What does OLAC mean by "language > archive"? That should help a lot with possible ambiguities in the > mission statement. > > An important open issue, which is still not clear to me from your > explanation is whether OLAC is focusing on a "card catalog" now because > that's all OLAC ever sees itself doing or if it, instead, views getting > the card catalog part right as the first step towards a deeper kind of > interoperability. (My reading of the mission would be that the latter > interpretation is correct, but I already missed out on the importance of > "library" in the mission. So, I'm probably missing several other points. > I think the crucial point in this regard is understanding what the level > of interoperability one hopes to achieve with respect to the > "interoperating repositories".) I don't think this is a merely pedantic > issue right now because it matters a lot for how we "advertise" OLAC. Do > we say, "OLAC is all about search!" (my simplification of something > Helen said) or do we say, "OLAC aims for digital linguistic utopia > starting with search!". (For what it's worth, I don't really care > strongly about which path OLAC takes, but I would like to be confident > I'm describing OLAC's goals correctly to other people.) > > >> aggregated catalogs, and (3) maintaining a catalog to help users find >> resources. The OLAC metadata standard is, of course, the >> specification for >> how to create an entry for the catalog. > > I'm actually somewhat confused by the fact that you say OLAC is > maintaining a catalog of resources. It was my understanding that OLAC is > right now only maintaining one kind of "catalog", but not one of > resources. Rather, it maintains a list of participating archives. The > full catalogs of resources (for linguists, at least) are maintained by > the two service providers: LINGUIST and the LDC. (I know there are lots > of connections between OLAC and these catalogs, but, strictly speaking, > I didn't think OLAC was in the catalog maintenance business but, rather, > defined a way through which a catalog could be maintained by outside > parties.) > > >> book, rather than an individual recorded session. Thus I think this >> interpretation of desired granularity is straightforwardly implied by the >> OLAC mission of creating a virtual library. If you have any ideas of >> specific wording changes in the granularity guidelines that might help to >> clarify this, I'll be glad to hear them. > > I think your response already has all of the required points. Helen > seemed to suggest adding an explanation to the mission statement. I'll > let you and Steven decide if that's appropriate. (I'm not sure what the > process is for adding explanations to the mission statement.) > > With respect to the guidelines, I recommend changing the first paragraph > of the granularity discussion to something like the following (based on > my understanding of your explanation): > > "Determining the right level for units to be described as language > resources in the OLAC context involves multiple factors. The level of > unit appropriate for inclusion in an aggregated catalog like OLAC's may > be different (typically higher) than the level desirable for the catalog > of a specific institution's holdings, which in turn is typically higher > than the level desirable for describing the detailed contents of a > resource. Consistent with its mission to create a virtual _library_ of > language resources, a basic rule of thumb for making determinations > regarding what kinds of units to treat as language resources should be > that they should be comparable to the kinds of units treated as > resources in a traditional library catalog. For example, libraries > typically assign a single record to each book, not to each chapter > within a book. A parallel example in the OLAC context would be treating > all the objects associated with a particular field trip as a single unit > rather than treating each of the individual resources created during > that field trip as separate units. The following discussion is aimed at > assisting an OLAC participant to find the right level of description." > > It might the be nice to give lots of concrete examples, maybe you could > get some of the participating archives to do this? > > One thing I deleted from that paragraph was reference to the > recommendation given in the Repository guidelines: > "A metadata repository should not degrade the 'signal-to-noise ratio' > for language resource discovery." > > I don't find this recommendation very helpful because (for me, at least) > it is too dependent on what kinds of resources I want to discover. In > other words, "language resource discovery" is too broad an activity for > there to be one "signal-to-noise ratio". For example, if I already know > I'm looking for resources on Nahuatl, I would probably not want to find > a record saying, "There's a bunch of material on Nahuatl that's part of > some bundle over at AILLA." The signal would be too weak for me--what > I'd prefer is the search result I'd get from AILLA's catalog. Of course, > for the next person, lots of detailed records about Nahuatl would > constitute "noise". Signal and noise just don't strike me as constant > enough to form the basis of a recommendation. > > I also don't like that this recommendation privileges language resource > _discovery_ over other possible uses of the catalog. For example, > library catalogs have at least one other function in addition to > discovery: retrieval. Often I know a resource exists, but I don't know > where it is, which is why I consult the catalog (this is my primary use > of WorldCat, for example). (The word "discovery" is potentially > ambiguous enough to cover "find something previously unknown" and > "retrieve", but that's not my initial reading.) So, I would prefer a > recommendation that was more agnostic regarding the use of the metadata. > > I personally find your new discussion of provenance in the metadata > usage guidelines much more helpful than 'signal-to-noise ratio', since > it's not dependent on particular uses of OLAC service providers. So, I'd > actually recommend the following revision to the repository guidelines > regarding granularity from the present recommendation to something like: > > "A metadata repository should treat resources with a single provenance > as constituting a single unit with respect to OLAC metadata and should, > therefore, describe them within a single record." > > Another advantage to talking about granularity in terms of provenance in > my view is that the current guidelines seem to be asking data providers > to hypothesize about what search scenarios their data will be put to, > but I don't think it's reasonable to expect data providers to be very > good at this, or to even to ask them to spend time thinking about this. > That's a job for service providers. Framing the issue in terms of > provenance allows data providers to use a kind of information they are, > in principle, experts about to structure their collections, which is > presumably a good way to achieve consistency. Furthermore, it allows > service providers to be reasonably confident that they are aggregating > records of the same basic kind from different service providers. It is > thus more consonant with the overall OAI model wherein data providers > and service providers interact in terms of a well-defined series of > agreements without the one having to pay attention to the internal > activities of the other. > > Jeff -- Helen Aristar-Dry Professor of Linguistics Director, Institute for Language Information and Technology (ILIT) Eastern Michigan University 2000 Huron River Rd., Suite 104 Ypsilanti, MI 48197 734.487.0144 (ILIT office) 734.487.7952 (faculty office) 734.482.0132 (fax) hdry at linguistlist.org From Gary_Simons at SIL.ORG Mon Mar 31 15:20:47 2008 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 31 Mar 2008 10:20:47 -0500 Subject: Last call for review of new metadata documents In-Reply-To: Message-ID: Dear implementers, Today being the stated last day of the review period, this is the last call for comments for the documents on metadata usage guidelines and metrics. (The original call with the URLs is appended.) That is not to say that we won't accept comments after today. if you come through with comments in the next few days, we'll gladly receive them. However, we'll start working on the next phase of the process soon. Thanks to Jeff and Helen for stimulating discussion on the granularity issue. We have good feedback for revising the granularity section of the usage guidelines. Jeff's proposal about replacing the signal-to-nois-ratio statement with one centered on provenance is a good one. That impacts the OLAC Process standard since that is where signal-tto-noise is stated as a principle for judging new registration applications. A light revision of our standards is in the wings as part of moving from 1.0 to 1.1 of the metadata standard, so we can address that point then. I also like the suggestion of using the FAQ to bring out these explanations of with is implied by virtual library in the mission statement. Jeff raises a question that I should answer, namely, "I'm actually somewhat confused by the fact that you say OLAC is maintaining a catalog of resources. It was my understanding that OLAC is right now only maintaining one kind of "catalog", but not one of resources. Rather, it maintains a list of participating archives. The full catalogs of resources (for linguists, at least) are maintained by the two service providers: LINGUIST and the LDC." While the LDC and Linguist search engines are the visible faces of search (and still others are possible given the OAI-PMH model), an important (but not so visible) service provided centrally by OLAC is the OLAC Aggregator. OLAC runs an incremental harvest every 12 hours of all registered repositories and offers a single aggregated catalog to the world via the OAI-PMH at the following base URL (which actually generates a useful documentation page if you visit it): http://www.language-archives.org/cgi-bin/olaca3.pl This is where, for instance, the mandatory OAI_DC metadata format of the OAI-PMH is implemented. In static repositories, data providers give only OLAC metadata, but OLAC plugs them into the wider OAI-PMH world by providing the crosswalk to OAI_DC format as a value-added service in the single OLACA repository. All of the new work with metrics and quality checks is also based on the aggregated catalog. OLAC does not "maintain" an original catalog in the same way that each data provider maintains its catalog; but we are maintaining the aggreaged catalog of the virtual library by harvesting everyday to keep it up to date and doing checks to maintain quality. OLACA can also serve as the single point of contact for anyone who wants to implement a service based on OLAC metadata--the possible approaches are to idependently run the OLAC harvester (and create one's own aggregated catalog) or to simply harvest from the pre-aggregated OLACA data provider. There is another question that begs for an answer: Do we say, "OLAC is all about search!" (my simplification of something Helen said) or do we say, "OLAC aims for digital linguistic utopia starting with search!". The latter statement is closer, if you substitute "descriptive metadata" for "search". But I hasten to add that when that utopia is reached, we won't call the result OLAC, just like we don't confuse the web with the W3C. My plenary talk for last summer's workshop, "Toward the interoperability of language resources," paints a picture of such a utopia as an interoperating cyberinfrastructure for linguistics (and gives a diagram on the closing slie): http://linguistlist.org/tilr/papers/TILR%20Plenary%20Slides.pdf Of the 12 elements in the infraostructure, elements 1 through 4 (Aggregator, Metadata standard, Submission protocol, and Harvesting protocol) are specifically identified as being OLAC's contribution. Those standards are what make it possible for the other 8 elements to be built in such a way that they interoperate with each other at least at the common denominator level defined by the metadata standard. The OLAC process also provides a way for the community developing this infrastructure to define additional standards that promote community interoperation, but the vision also includes specialized subcommunities getting togetehr to define more specific standards that are specific to their areas of focus. I hope that helps clarify things. Best, -Gary Gary Simons To Sent by: OLAC OLAC-IMPLEMENTERS at LISTSERV.LINGUIST Implementers List LIST.ORG Subject Call for review of new metadata documents 03/05/2008 10:35 PM Please respond to Open Language Archives Community Implementers List Dear implementers, Many of you also subscribe to the OLAC-GENERAL list and so have gotten the general announcement about this call for review for new metadata documents. Those of you who have implemented an OLAC data provider are directly affected since this new work focuses on ways of improving the quality of the metadata in our implementations. In this message we repeat the general announcement for the benefit of those not subscribed to OLAC-GENERAL, and then we supply further information that is relevant to you as implementers. Six months ago the US National Science Foundation awarded funding for a project named "OLAC: Accessing the World's Language Resources" which aims to greatly improve access to language resources for linguists and the broader communities of interest. If you are interested in learning more about the project, you may visit the project home page at: http://olac.wiki.sourceforge.net/ In the first phase of the project we are focusing on improving metadata quality as a prerequisite to improving the quality of search. To that end we have drafted some new documents that can serve as a basis for improving and measuring metadata quality within our community: Best Practice Recommendations for Language Resource Description http://www.language-archives.org/REC/bpr.html OLAC Metadata Usage Guidelines http://www.language-archives.org/NOTE/usage.html OLAC Metadata Quality Metrics http://www.language-archives.org/NOTE/metrics.html These documents have been reviewed in Draft status by the Metadata Working Group. After significant revision, they are now promoted to Proposed status and are thus ready for review by the entire community. In keeping with the OLAC Process standard, we hereby make a formal call for review. The review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by simply replying to this message. The OLAC Metadata Standard that you followed in implementing your repository defines the constraints on validity for a metadata record, but it gives no advice about what a high quality metadata record is like. The first two documents listed above address this issue. Then, in keeping with the OLAC core value of "Peer Review", we want to implement a service that will measure conformance to the recommendations that can be automatically tested for. That is the issue addressed by the third document listed above. We have implemented the proposed Metadata Quality Score so that you can see the implications for your current metadata. (As the documents are revised to express community consensus, the implementation of the metrics will be updated to match.) The metadata quality analysis as currently implemented is accessible from a test version of the Participating Archives page. The site has no links to this test page; it is accessed by entering this URL in a browser: http://www.language-archives.org/archives-new.php Follow the "Sample Record" link for your archive to see the quality score for the sample record named in your Identify response, along with comments on what can be done to improve the score. Follow the "Metrics" link to see the average quality score for the records you are currently providing. Kudos to the Audio Archive of Linguistic Fieldwork (Berkeley), Centre de Ressources pour la Description de l'Oral (CRDO), and the CHILDES Data Repository who are already getting scores around 8 or higher. The rest of us have room for significant improvement! Eventually, this new Participating Archives page will replace the one that is currently accessed from the ARCHIVES link in the OLAC site banner. However, this will not happen right away. After the current round of review and any subsequent revisions, the documents will be put to the OLAC Council, who will check the revised documents and promote them to Candidate status when they feel they are ready. Next we will issue a call for implementation and give at least one month for implementer feedback. Based on that feedback, final revisions will be made to the satisfaction of the Council who will then grant Adopted status. The new Participating Archives page will not replace the current one until the new guidelines and metrics are adopted. This discussion of process is to let you know that you will probably want to plan to update the implementation of your metadata repository some time within the next few months. When these new metadata recommendations and usage guidelines are officially adopted, the public will be able to see the metrics scores for your repository. In the meantime, it is just other implementers who are seeing them. You need not wait until the Candidate call for implementation to begin implementing changes. As soon as your updated repository is harvested, you will see the metrics change. Again, the review period will end on MARCH 31, at which point all of the comments that have been received will be processed to create revised versions of the documents. You may submit comments by replying to the list (and potentially entering into discussion with other implementers) or by mailing them to . That account is handled by Debbie Chang, a Masters candidate at the Graduate Institute of Applied Linguistics who is the Research Assistant for our project. She will compile a list of all the comments (whether submitted to the list or to the project account), which the document editors will then be asked to respond to. That response will come after the review period closes. With a solid foundation based on quality metadata, our grant project will be able to build improved search services and to expand coverage by attracting more participating archives and by implementing gateways to other aggregators. We are grateful for your participation in this venture and trust that you share our excitement about its potential. Best wishes, Gary & Steven _______ Steven Bird, University of Melbourne and University of Pennsylvania Gary Simons, SIL International and GIAL OLAC Coordinators (www.language-archives.org)