From hdry at LINGUISTLIST.ORG Sun Nov 3 15:29:40 2002 From: hdry at LINGUISTLIST.ORG (Helen Aristar Dry) Date: Sun, 3 Nov 2002 10:29:40 -0500 Subject: Some comments on the LINGUIST service provider In-Reply-To: <200210030138.g931cmM07394@unagi.cis.upenn.edu> Message-ID: Thanks, very much, Steven. I think you're definitely right on the last two counts and probably on the first. We added the search blank on the first page at your suggestion before LREC, remember? We thought it was a good idea. But it did mean that we now had three levels of search: search blank, simple search, and advanced search. I've been meaning to do exactly what you suggest--collapse simple and advanced--but had let it slip down on my infinitely long to-do list. You know how it is! Thanks for the nudge! I think the first page could indeed be shortened. And your idea of adding a "more about OLAC" link is a good one. But remember that you're coming from a situation where you already know what OLAC is. Most of our readers haven't a clue. So maybe some explanation is really necessary. We'll take another look . . . Again, thanks for the help. It's always useful to have your feedback. Hope OZ is treating you well. We miss having you just down the road.... -Helen Date sent: Wed, 2 Oct 2002 21:38:48 EDT Send reply to: Steven Bird From: Steven Bird Organization: University of Melbourne Subject: Some comments on the LINGUIST service provider To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > One of the workshop preparatory tasks is: > > > 3. Review: choose three participating archives besides your own and > > suggest improvements to their use of metadata; review the > > www.language-archives.org site and the www.linguistlist.org/olac/ > > service and suggest improvements. > > I have three low-level comments on the LINGUIST service provider. I hope > this feedback will make the service even better than it already is... > > a) The first page you come to is a long document with a search form some > way down. > > I'd favor a very simple page (cf www.google.com) consisting of a search > box, a link to the advanced search, and a link to "more about OLAC" which > has all the original text. > > b) Users wanting "more powerful search" are directed to the "OLAC Query > page". (Weren't we just on an OLAC query page?) Arriving on this new page, > we see that it is called "OLAC Query Form: Simple Search". This is > confusing, since we've just come from a simple search page expecting the > more powerful search page, only to find that this is still only simple > search. There's no pointer back to the really simple search. > > I'd prefer this to be called "Advanced Search" (both on the title and the > incoming link), with a backpointer to the simple search. > > c) This second page points to yet another page, called Advanced Search. > However, this generates an error: "ODBC Error Code = S1000 (General error) > [TCX][MyODBC]Table 'OLAC.alltypes' doesn't exist". I expect this really > advanced search permits search on all fields. > > I'm not convinced we need three levels of search. Could the second and > third levels be collapsed into a single level, containing all the search > fields? > > Does anyone else have comments on this service? > > -Steven > > -- > Steven Bird Email: Web: http://www.cs.mu.oz.au/~sb/ > A/Prof, Dept of Computer Science, University of Melbourne, Vic 3010, AUSTRALIA > Senior Research Assoc, Linguistic Data Consortium, University of Pennsylvania From hdry at LINGUISTLIST.ORG Sun Nov 3 16:04:30 2002 From: hdry at LINGUISTLIST.ORG (Helen Aristar Dry) Date: Sun, 3 Nov 2002 11:04:30 -0500 Subject: Some comments on the LINGUIST service provider In-Reply-To: <005401c26ac7$142f1020$c800000a@50bneave.net> Message-ID: OOOPs...thanks very much, Baden. We will deal with the broken link asap. About the search suggestion: our "simple search" was intended to do exactly what you describe . . . with the addition of the ability to search by language, which we thought many linguists would want. The addition of a single search blank on the home page has sort of unbalanced the system--as Steven points out, 3 levels of search seem too much. But he suggests having a search blank, plus a full search. I guess I just need to think about whether there's some way to do both what he suggests and what you suggest. I don't see it right now. But I really appreciate the feedback. Thanks for taking the time to make these suggestions. All the best, -Helen Date sent: Thu, 3 Oct 2002 20:24:37 +1000 Send reply to: baden at compuling.net From: Baden Hughes Organization: CompuLing Subject: Re: Some comments on the LINGUIST service provider To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > >From dealing with some new end users who have been introduced to OLAC > via the Linguist interface, I've got a couple of related comments. > > Users would like to have a simple search - by title, author, description > and subject language. This would mean author would be added to the > existing Quick Search. > > There is a difference between the number of archives actively searched > on the LL site and those registered at the OLAC site. I would have > assumed automated harvesting of the new archives as they are registered > at either location ? > > > An ultra-low level comment, when you click on the link at the bottom of > the LinguistList OLAC page: > > "If you would like to help with the OLAC enterprise, please let us know! > > Thank you in advance for your help! " > > An email message is launched, but there's no email address to send > things to (ie mailto: is malformed). > > Baden From sb at CS.MU.OZ.AU Mon Nov 4 01:52:55 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Sun, 3 Nov 2002 20:52:55 EST Subject: Some comments on the LINGUIST service provider In-Reply-To: Your mail dated Sunday 3 November, 2002. Message-ID: Helen et al, The new search pages on the LINGUIST site are greatly improved! Here are some further suggestions concerning layout and features. I'm sending them to the list to see if others have opinions about any of this. First, I think the search field should precede the descriptive text, so that nothing at all gets in the way of users doing searches (cf search engines). As it is there is already a fixed banner and sidebar that the user has to skip over. Second, the search section is ambiguous. Should the user click on "Search th OLAC catalog" or should they enter keywords and click the search button? I think the search icon could be omitted, since you already have a link to advanced search. Third, it would be helpful to be able to browse the contents of an individual archive. This is akin to the archive selection you can make on the advanced form, but it doesn't require extra search fields and could be made more accessible by putting it on the front page. The interface could parallel the keyword search field: Keyword(s): [TEXT FIELD] (search button) Browse: [ARCHIVE MENU] (browse button) Fourth, in cases where there are several screenfulls of hits, it would be helpful to be able to sort the search results by title, creator, date, subject language, archive, or type. This just requires an "order by" clause on the SQL side, and could be presented as a pulldown menu on the advanced search form and on the search results form. Anyway, there are some ideas I'm throwing out. Thanks for the service you're already providing. Steven Bird From sb at CS.MU.OZ.AU Mon Nov 4 06:47:27 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Mon, 4 Nov 2002 01:47:27 EST Subject: Where to host OLAC schemas Message-ID: Recently there was some discussion about hosting the schemas for certain OLAC vocabularies off-site at a location that is directly managed by the author of the schema, mainly because this would be more convenient for the author. After some consideration, Gary and I now believe that these documents should be hosted on the OLAC site. Here are several reasons why: 1. Only mature vocabularies achieve community-wide acceptance, and we anticipate a slow rate of change by the time a vocabulary is approved at the OLAC level. Therefore the updates to the schema will be infrequent. The process of installing a revised schema on the OLAC site would involve a second level of quality control, and this is appropriate for an OLAC tandard that many other systems depend on. 2. Any recommendations and implementation notes pertaining to the vocabulary will already be posted on the OLAC site. New versions of these documents may need to be created when a vocabulary is updated. Transferring an updated schema to the OLAC site does not represent a significant additional burden. 3. When several sites host official OLAC schemas, only one of the sites has to be down in order for validation to be impossible. 4. People can see that a schema is endorsed by OLAC simply from its location, rather than having to inspect the XML schema for the OLAC metadata format. If OLAC vocabularies are ever adopted into other DCMI application profiles then the link to OLAC is maintained. 5. We draw a clear division between OLAC extensions and third-party extensions, when schemas that are approved by the community are put in the OLAC namespace and hosted on the OLAC site. 6. Posting material from many authors on the OLAC site demonstrates that building OLAC is a community effort. Additionally, authors of OLAC vocabularies and the associated schemas are able to demonstrate the impact of their work when they can point to their documents posted on the OLAC site. Convinced? Please let us know what you think. Thanks, Steven Bird From baden at COMPULING.NET Mon Nov 4 09:11:27 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Mon, 4 Nov 2002 19:11:27 +1000 Subject: Updated Schemas for OLAC Language Technology Description Message-ID: I've updated several of the experimental schemas I posted at http://www.compuling.net/projects/olac The major update is that the OLAC-Functionality schema now reflects the classifications provided at LT-World (http://wwww.lt-world.org) rather than the earlier ones from the HLT Survey. Comments welcome as usual. Regards Baden From baden at COMPULING.NET Tue Nov 5 06:30:30 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Tue, 5 Nov 2002 16:30:30 +1000 Subject: Where to host OLAC schemas In-Reply-To: <200211040647.gA46lRM12099@unagi.cis.upenn.edu> Message-ID: > 1. Only mature vocabularies achieve community-wide > acceptance, and we anticipate a slow rate of change by the > time a vocabulary is approved at the OLAC level. Therefore > the updates to the schema will be infrequent. The process of > installing a revised schema on the OLAC site would involve a > second level of quality control, and this is appropriate for > an OLAC tandard that many other systems depend on. In essence, it sounds as if you're proposing an editorial review of changes to core schemas. As with similar community based standards initiatives, I think this is a welcome development. Certainly it should enhance the quality of the schemas that do make it into the OLAC standard. The next question that leads from this is "who will do this review" ? Additionally, some form of centralised version control would be advantageous in this context. > 2. Any recommendations and implementation notes pertaining to > the vocabulary will already be posted on the OLAC site. New > versions of these documents may need to be created when a > vocabulary is updated. Transferring an updated schema to the > OLAC site does not represent a significant additional burden. Agreed, especially if there is a documented process by which revisions are proposed, implemented and disseminated. > 3. When several sites host official OLAC schemas, only one of > the sites has to be down in order for validation to be impossible. This is probably the most significant point. If schemas developed by individuals are encouraged it is likely that for various reasons there may be times when the hosting site is unavailable owing to the infrstructure choices made by individuals. Institutions have a greater chance of providing high availability network and hardware solutions to serve this purpose. However ... the same criticism can still be levelled even at this approach - if all the schemas are hosted on the OLAC server, then it also becomes a single point of failure with regard to validation. A better approach would be to have some kind of mirroring arrangement (this is probably beyond the scope of this list) which would ensure that multiple sites held the authoritative version of a schema and could be switched between as necessary. > 4. People can see that a schema is endorsed by OLAC simply > from its location, rather than having to inspect the XML > schema for the OLAC metadata format. If OLAC vocabularies > are ever adopted into other DCMI application profiles then > the link to OLAC is maintained. This is the most important of the points raised IMHO. The fact that interoperability comes for free by using this process is of community wide benefit. > 5. We draw a clear division between OLAC extensions and > third-party extensions, when schemas that are approved by the > community are put in the OLAC namespace and hosted on the OLAC site. > > 6. Posting material from many authors on the OLAC site > demonstrates that building OLAC is a community effort. > Additionally, authors of OLAC vocabularies and the associated > schemas are able to demonstrate the impact of their work when > they can point to their documents posted on the OLAC site. This will serve to encourage innovation and excellence. The as third party schemas are tested and improved, the OLAC process allows for consensus to be formed about the adoption of such schemas into the OLAC namespace. In other words, kudos for writing a good schema implementation. As a third party schema developer, this all sounds reasonable to me. Of course, there's probably a point at which I should be following some "best practice" in terms of schema development and promotion, but I guess we can add that to the list of things to be done :-) Baden From sb at CS.MU.OZ.AU Wed Nov 6 06:37:16 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Wed, 6 Nov 2002 01:37:16 EST Subject: workshop program Message-ID: Folks, A draft program is now posted on the workshop website at: http://www.language-archives.org/events/olac02/ We've structured the program to maximize the chances of reaching consensus on the core infrastructure and get to the point where we can launch version 1.0 of the metadata set including a core set of vocabularies. We also want this meeting to set the agenda for 2003. As advertized there is no time for individual conference-style paper presentations. However the meeting will open with a session in which every archive and service will be briefly introduced, and there will be ample informal discussion time for this (2.5 hours of breaks during the day and free evenings). The second half of Wednesday afternoon is scheduled as an open forum for new initiatives of community-wide interest, so please keep the ideas coming. Please let us know if we've left anything out, or if you think we need to adjust time allocations etc. Thanks, Steven Bird From haejoong at UNAGI.CIS.UPENN.EDU Wed Nov 6 18:25:59 2002 From: haejoong at UNAGI.CIS.UPENN.EDU (Haejoong Lee) Date: Wed, 6 Nov 2002 13:25:59 -0500 Subject: OLAC suite updates Message-ID: Dear OLAC implementers, New OLAC suite has been released with new features! OLAC suite is a combination of OLAC harvester (Ovester) and aggregator (OLACA) written in perl. Ovester harvests OAI records from OLAC data providers, and stores them in the OLAC MySQL database. OLACA exports the records in the OLAC database using OAI-PMH. They can be used to implement useful services over OLAC archives. Please check the new features below. For more details, please read ChangaLog and README included in the OLAC suite package. New features: Ovester (OLAC harvester) - central db lookup for harvest list: Ovester tries to download harvest list from the remote server (language-archives web server) by default. - database synchronization: The archives in the OLAC database are synchronized to those in harvest list, i.e. the archives that are not listed in the harvest list are cleaned up in the database. - use of seperate database account information file: The database account information is not hardcoded in the Ovester code, but kept in a seperate file for security. - The features listed above are controlled by command line options. For example, you can turn on/off the synchronization using -p optiuon. OLACA (OLAC Aggregator) - Query verb: A new verb, "Query" is added. Please see http://www.language-archives.org/NOTE/query.html - OAI-PHM 2.0: OLAC suite 2 contains OAI-PHM 2.0 compliant OLACA. Downloads: OLAC suite (Ovester:OAI-PMH 1.1, OLACA:OAI-PMH 1.1): http://www.language-archives.org/tools/olac_suite.tgz OLAC suite 2 (Ovester:OAI-PMH 1.1, OLACA:OAI-PMH 2.0): http://www.language-archives.org/tools/olaca_suite2.tgz Thanks, Haejoong From Gary_Simons at SIL.ORG Fri Nov 8 17:51:36 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 8 Nov 2002 11:51:36 -0600 Subject: Peer review of archives in preparation for workshop Message-ID: Dear workshop participants, You will recall that the third goal of our upcoming workshop in December, as stated in the Workshop Overview on the web site, is: 3. Review: To give feedback to each participating archive on its use of metadata, to review the services on the OLAC and LINGUIST sites. We have also warned you that we wanted each participant to do some preparatory tasks prior to the workshop, including reviewing metadata from three archives besides your own. Joan Spanne, the archivist for SIL International, has agreed to help us by collating the results of these individual archive reviews and to make a presentation on the "State of the Archives" at the workshop. In addition to the benefit to each archive of getting constructive peer review, we anticipate that another key outcome will be improvements to our metadata guidelines and identification of more best practice recommendations. In order to facilitate this review process, we have worked with Joan to develop a peer review form, which is attached. We have also worked out specific review assignments. Each workshop participant has been assigned to review specific archives. Consult the following web page to see which archives you have been assigned to: http://www.language-archives.org/events/olac02/reviews.html The full instructions on how to perform the review are given in the attached review form (which is also accessible via a link at the top of the web page just mentioned). This should not be a time consuming process. We anticipate that a single review can be completed within 30 minutes. You may also need to spend some time familiarizing yourself again with the relevant OLAC standards. Links to these are given in the detailed instructions. The reviews for each archive will be collated and sent to the contact person for the archive as anonymous reviews. Of course, the web page of review assignments gives some clue as to who reviewers might be, but it will be impossible to know exactly who said what, so we trust there will be an adequate level of anonymity. The actual anonymity will be increased by the fact that there will often be reviewers in addition to the ones named on the assignment page. After the due date, we will ask some of you who have shown a knack for this sort of review to fill in some gaps left by reviews that may not have come in. You are also encouraged to submit reviews of any additional archives you please at your own initiative. The deadline for submission of completed reviews is two weeks from today, FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! Address completed reviews to: joan_spanne at sil.org, olac-admin at language-archives.org We look forward to good feedback from all of you. Don't hesitate to contact us if you have any questions. Best, Gary Simons (and Steven Bird) (See attached file: review-form.txt) (See attached file: review-form.txt) -------------- next part -------------- A non-text attachment was scrubbed... Name: review-form.txt Type: application/octet-stream Size: 3457 bytes Desc: not available URL: From Alexis.Dimitriadis at LET.UU.NL Sat Nov 9 10:36:53 2002 From: Alexis.Dimitriadis at LET.UU.NL (Dimitriadis, Alexis) Date: Sat, 9 Nov 2002 11:36:53 +0100 Subject: Peer review of archives in preparation for workshop Message-ID: Hi, I just looked at the archive list, and I do not see my name as a reviewer. Perhaps this is because, as I discovered just yesterday, I was not subscribed to the OLAC-IMPLEMENTERS list. (There was enough material coming in from OLAC-METADATA that I did not realize I was missing something important). I will be attending the workshop as a representative of the TDS/LTRC group in the Netherlands, and would very much like to do my part! Alexis _____________________________________________ Alexis Dimitriadis alexis.dimitriadis at let.uu.nl +31-30-253-6219 Utrecht Institute of Linguistics OTS Trans 10 3512 JK Utrecht The Netherlands -----Original Message----- From: Gary Simons [mailto:Gary_Simons at SIL.ORG] Sent: Friday, 08 November, 2002 18:52 To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG Subject: Peer review of archives in preparation for workshop Dear workshop participants, You will recall that the third goal of our upcoming workshop in December, as stated in the Workshop Overview on the web site, is: 3. Review: To give feedback to each participating archive on its use of metadata, to review the services on the OLAC and LINGUIST sites. We have also warned you that we wanted each participant to do some preparatory tasks prior to the workshop, including reviewing metadata from three archives besides your own. Joan Spanne, the archivist for SIL International, has agreed to help us by collating the results of these individual archive reviews and to make a presentation on the "State of the Archives" at the workshop. In addition to the benefit to each archive of getting constructive peer review, we anticipate that another key outcome will be improvements to our metadata guidelines and identification of more best practice recommendations. In order to facilitate this review process, we have worked with Joan to develop a peer review form, which is attached. We have also worked out specific review assignments. Each workshop participant has been assigned to review specific archives. Consult the following web page to see which archives you have been assigned to: http://www.language-archives.org/events/olac02/reviews.html The full instructions on how to perform the review are given in the attached review form (which is also accessible via a link at the top of the web page just mentioned). This should not be a time consuming process. We anticipate that a single review can be completed within 30 minutes. You may also need to spend some time familiarizing yourself again with the relevant OLAC standards. Links to these are given in the detailed instructions. The reviews for each archive will be collated and sent to the contact person for the archive as anonymous reviews. Of course, the web page of review assignments gives some clue as to who reviewers might be, but it will be impossible to know exactly who said what, so we trust there will be an adequate level of anonymity. The actual anonymity will be increased by the fact that there will often be reviewers in addition to the ones named on the assignment page. After the due date, we will ask some of you who have shown a knack for this sort of review to fill in some gaps left by reviews that may not have come in. You are also encouraged to submit reviews of any additional archives you please at your own initiative. The deadline for submission of completed reviews is two weeks from today, FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! Address completed reviews to: joan_spanne at sil.org, olac-admin at language-archives.org We look forward to good feedback from all of you. Don't hesitate to contact us if you have any questions. Best, Gary Simons (and Steven Bird) (See attached file: review-form.txt) (See attached file: review-form.txt) From Alexis.Dimitriadis at LET.UU.NL Sat Nov 9 10:49:11 2002 From: Alexis.Dimitriadis at LET.UU.NL (Dimitriadis, Alexis) Date: Sat, 9 Nov 2002 11:49:11 +0100 Subject: Oops! Message-ID: My apologies, I did not intend to send the last message to the entire list! Alexis -----Original Message----- From: Dimitriadis, Alexis [mailto:Alexis.Dimitriadis at let.uu.nl] Sent: Saturday, 09 November, 2002 11:37 To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG Subject: Re: Peer review of archives in preparation for workshop ... From sb at CS.MU.OZ.AU Sun Nov 10 23:52:12 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Sun, 10 Nov 2002 18:52:12 EST Subject: New report on OLAC infrastructure Message-ID: Folks, Gary Simons and I have recently written a 12-page report containing a comprehensive and up-to-date overview of OLAC technical infrastructure. The final draft, now under review, is posted at: http://www.language-archives.org/docs/lht-draft.pdf Although this paper is directed at a wider digital libraries audience, we hope it will help workshop participants prepare for our technical discussions. Comments welcomed. Steven Bird From churen at GATE.SINICA.EDU.TW Tue Nov 12 07:49:08 2002 From: churen at GATE.SINICA.EDU.TW (Chu-Ren Huang) Date: Tue, 12 Nov 2002 02:49:08 EST Subject: Suggestion for adding a Proofreader role Message-ID: Dear All: We would like to suggest the addition of a Proofreader role to OLACMC. This suggestion was originally made in a paper (Ru-Yng Chang and Chu-Ren Huang. 2002. OLACMS: Comparisons and Applications in Chinese and Formosan Languages. Proceedings of The 3rd Workshop on Asian Language Resources and International Standardization, A Post-COLING2002 workshop.) Since all digitized material must be proofread to ensure quality, it is important to know who the proofreader is. A proofreader should be able to refer to an individual, a team, or a standard procedure. This is very crucial for heritage data, since a competent proofreader may require some reading skill of a language that is no longer in use. Take classical Chinese text (or Latin) for example. To construct an archive based on classical Chinese texts, input and proofreading cannot be avoided. The standard procedure here at Academia Sinica involves inputting the text twice (manually or automatically, but by different teames), run an automatic proofreading program on the two versions, then use human proofreader to go over the differences for at least three more runs (while proofreading the identical parts quickly.) Such a standard procedure can ensure data integrety and accurracy to a very high standard (less than 1% of error remains.) The mor traditional way require 7 runs of proofreading, with one proofreader reading the text backwards [to avoide automatic self-correction in reading)> In other words, a text proofread by the AS team should be highly reliable. While a text archived by a Chinese graduate student, input and proofread by him/herself may not reach the same level. Chu-Ren From sb at CS.MU.OZ.AU Tue Nov 12 08:04:00 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Tue, 12 Nov 2002 03:04:00 EST Subject: Suggestion for adding a Proofreader role In-Reply-To: Your mail dated Tuesday 12 November, 2002. Message-ID: I forwarded this to the METADATA list - please respond to this posting there. http://lists.linguistlist.org/archives/metadata.html -Steven Bird From martin.wynne at OTA.AHDS.AC.UK Wed Nov 13 16:39:35 2002 From: martin.wynne at OTA.AHDS.AC.UK (Martin Wynne) Date: Wed, 13 Nov 2002 16:39:35 -0000 Subject: A simpler format for OLAC vocabularies and schemes Message-ID: Gary, Sorry about my long response time, but I'm just catching up on mail from the last month or so on this list in preparation for the workshop. The following is with reference to Steve and Gary's posings from 31st October. If I understand correctly, the proposal is to move from: Orginal title Translated title to: Original title Translated title I've checked with the relevant DCMI website and this does indeed seem to be in conformance with their recommendations. Now perhaps in that case I should take this up with the DCMI and not with OLAC... What I can't see is why the element is neither embedded in the element, nor identified as a type of title element (as in OLAC 0.4). To put it bluntly, a human can't see what it is an alternative type of, since it is not tagged in any obvious way as a title. Does this require that "alternative" be defined (somwhere?) as a type of title? And in this case, shouldn't it be called "alternativeTitle" or something more transparent? Up to this point I am heartily in accordance with the suggestions for simplifying the formats. I agree that syntactic conformance with DC is a good thing, but they now seem to be going down a road which aims to flatten out any hierarchical organisation of the data classification, and makes human readability of the XML impossible. Further apologies if I've got the wrong end of the stick here by coming in belatedly to the discussion. Best, Martin From Gary_Simons at SIL.ORG Thu Nov 14 03:24:41 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Wed, 13 Nov 2002 21:24:41 -0600 Subject: A simpler format for OLAC vocabularies and schemes Message-ID: <WED.13.NOV.2002.212441.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Martin, This is a good question. Let me take a stab at answering: >If I understand correctly, the proposal is to move from: > > <title>Orginal title > Translated title > >to: > > Original title > Translated title > >I've checked with the relevant DCMI website and this does indeed seem to be >in conformance with their recommendations. Now perhaps in that case I should >take this up with the DCMI and not with OLAC... What I can't see is why the > element is neither embedded in the element, nor identified >as a type of title element (as in OLAC 0.4). First, let me make sure everyone understands what dcterms is. It is the namespace for all of the refinements defined in the DC Qualifiers recommendation. Thus, there is also: <dcterms:hasPart>A qualified Relation</dcterms:hasPart> <dcterms:temporal>A qualified Coverage</dcterms:temporal> and so on > To put it bluntly, a human >can't see what it is an alternative type of, since it is not tagged in any >obvious way as a title. Does this require that "alternative" be defined >(somwhere?) as a type of title? And in this case, shouldn't it be called >"alternativeTitle" or something more transparent? It is quite true that the XML file gives no clue as to the corresponding non-qualified element. However, there is no ambiguity since each DC refinement is defined to occur with only one DC element. That is, <alternative> is defined to be a refinement of Title and nothing else, <temporal> of Coverage, and so on. The mapping problem is solved in implementation by adding a table of refinement to non-qualified element pairs to the harvested metadata database. This allows a service provider to "dumb down" the tags in the dcterms namespace to their dc equivalents. The standard OLAC harvester will have this built in. >Up to this point I am heartily in accordance with the suggestions for >simplifying the formats. I agree that syntactic conformance with DC is a >good thing, but they now seem to be going down a road which aims to flatten >out any hierarchical organisation of the data classification, and makes >human readability of the XML impossible. Note that the hierarchical organisation is in the classification scheme, and not in the data itself. That is why it is appropriate for data encoding to be "flattened". There only needs to be one instance of the classification hierarchy (e.g. the database table I mention above, or the DCMI's RDF schema for dcterms), and a flattened tag can be looked up in that hierarchy rather than repeating the refinement-to-element mapping in every instance of the refinement. >Further apologies if I've got the wrong end of the stick here by coming in >belatedly to the discussion. I hope that makes sense. -Gary From martin.wynne at OTA.AHDS.AC.UK Fri Nov 15 13:58:44 2002 From: martin.wynne at OTA.AHDS.AC.UK (Martin Wynne) Date: Fri, 15 Nov 2002 13:58:44 -0000 Subject: OLAC DTD Message-ID: <FRI.15.NOV.2002.135844.0000.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Where can I find a copy of the file olacrep.dtd? From Gary_Simons at SIL.ORG Fri Nov 15 14:57:13 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 15 Nov 2002 08:57:13 -0600 Subject: OLAC DTD Message-ID: <FRI.15.NOV.2002.085713.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> On 11/15/2002 07:58:44 AM Martin Wynne wrote: >Where can I find a copy of the file olacrep.dtd? That sounds like an early name of the DTD for an OLAC repository in XML. That is now replaced by an XML schema: http://www.language-archives.org/OLAC/0.4/oryx.xsd If you really do need the historical artifact, it appears to still be posted on the site (with a version date of 28 Jun 2001) at: http://www.language-archives.org/tools/xsl/olacrep.dtd -Gary Simons From sb at CS.MU.OZ.AU Thu Nov 21 07:17:00 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Thu, 21 Nov 2002 02:17:00 EST Subject: Peer review of archives in preparation for workshop In-Reply-To: Your mail dated Friday 8 November, 2002. Message-ID: <THU.21.NOV.2002.021700.EST.SB@CS.MU.OZ.AU> Folks, Please note that the archive reviews are due this Friday, 22 November. For information on the reviewing assignments, please see: http://www.language-archives.org/events/olac02/reviews.html The original announcement follows. Thanks, Steven Bird > Dear workshop participants, > > You will recall that the third goal of our upcoming workshop in December, > as stated in the Workshop Overview on the web site, is: > > 3. Review: To give feedback to each participating archive on its use of > metadata, to review the services on the OLAC and LINGUIST sites. > > We have also warned you that we wanted each participant to do some > preparatory tasks prior to the workshop, including reviewing metadata from > three archives besides your own. > > Joan Spanne, the archivist for SIL International, has agreed to help us by > collating the results of these individual archive reviews and to make a > presentation on the "State of the Archives" at the workshop. In addition > to the benefit to each archive of getting constructive peer review, we > anticipate that another key outcome will be improvements to our metadata > guidelines and identification of more best practice recommendations. > > In order to facilitate this review process, we have worked with Joan to > develop a peer review form, which is attached. We have also worked out > specific review assignments. Each workshop participant has been assigned > to review specific archives. Consult the following web page to see which > archives you have been assigned to: > > http://www.language-archives.org/events/olac02/reviews.html > > The full instructions on how to perform the review are given in the > attached review form (which is also accessible via a link at the top of the > web page just mentioned). This should not be a time consuming process. We > anticipate that a single review can be completed within 30 minutes. You may > also need to spend some time familiarizing yourself again with the relevant > OLAC standards. Links to these are given in the detailed instructions. > > The reviews for each archive will be collated and sent to the contact > person for the archive as anonymous reviews. Of course, the web page of > review assignments gives some clue as to who reviewers might be, but it > will be impossible to know exactly who said what, so we trust there will be > an adequate level of anonymity. The actual anonymity will be increased by > the fact that there will often be reviewers in addition to the ones named > on the assignment page. After the due date, we will ask some of you who > have shown a knack for this sort of review to fill in some gaps left by > reviews that may not have come in. You are also encouraged to submit > reviews of any additional archives you please at your own initiative. > > The deadline for submission of completed reviews is two weeks from today, > FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! > Address completed reviews to: > > joan_spanne at sil.org, olac-admin at language-archives.org > > We look forward to good feedback from all of you. Don't hesitate to > contact us if you have any questions. > > Best, > > Gary Simons (and Steven Bird) From sb at CS.MU.OZ.AU Fri Nov 22 08:31:13 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Fri, 22 Nov 2002 03:31:13 EST Subject: Vida for OAI-PMH 2.0 Message-ID: <FRI.22.NOV.2002.033113.EST.SB@CS.MU.OZ.AU> Folks, The current version of Vida, http://www.language-archives.org/vida, implements version 1.1 of the OAI protocol. I have created a beta version of Vida2 that implements version 2.0 of the protocol. It is not a full implementation since it does not generate all the error responses. However, it should be enough for people who want to expose their OLAC XML files to current OAI harvesters. Please see: http://www.language-archives.org/vida2 Note that the OAI is developing their own, general version of Vida, to be made available in December. We may be able to use that instead of our own vida2, and avoid the trouble of tracking future changes to the protocol. -Steven Bird From ruyng at GATE.SINICA.EDU.TW Fri Nov 22 11:49:03 2002 From: ruyng at GATE.SINICA.EDU.TW (Ru-Yng Chang) Date: Fri, 22 Nov 2002 06:49:03 -0500 Subject: experimental schema:type.functionality Message-ID: <FRI.22.NOV.2002.064903.0500.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> There are some different from the Application(s) of LDC. message understanding. pronunciation modeling. prosody. speaker identification. speaker verification. topic detection and tracking. I'm not sure whether appropriate. ruyng From baden at COMPULING.NET Fri Nov 22 15:17:21 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Sat, 23 Nov 2002 01:17:21 +1000 Subject: experimental schema:type.functionality In-Reply-To: <OLAC-IMPLEMENTERS%2002112206490408@LISTSERV.LINGUISTLIST.ORG> Message-ID: <SAT.23.NOV.2002.011721.1000.> Hi Ru-Yng Chang Thanks for your comments. In the new version of OLAC-Functionality available at http://www.compuling.net/projects/olac/ the inclusion of these types is mostly completed by the use of the HLT Survey categories (document in preparation. Regards Baden > -----Original Message----- > From: OLAC Implementers List > [mailto:OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG] On > Behalf Of Ru-Yng Chang > Sent: Friday, 22 November 2002 21:49 > To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > Subject: Re: experimental schema:type.functionality > > > There are some different from the Application(s) of LDC. > > message understanding. > pronunciation modeling. > prosody. > speaker identification. > speaker verification. > topic detection and tracking. > > I'm not sure whether appropriate. > > ruyng > From sb at CS.MU.OZ.AU Fri Nov 22 22:02:36 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Fri, 22 Nov 2002 17:02:36 EST Subject: Workshop preparation Message-ID: <FRI.22.NOV.2002.170236.EST.SB@CS.MU.OZ.AU> Folks, Please keep the archive reviews coming. They are providing a valuable and timely critique of our archives and our infrastructure, and will help us make well-informed decisions at the workshop. We'll take late ones, but the sooner the better of course... There are many other preparation activities, such as reviewing the new controlled vocabulary documents and testing the vocabularies on your archives. A list of these activities is posted at: http://www.language-archives.org/events/olac02/preparation.html People who won't be attending the meeting are particularly encouraged to make your voices heard on the mailing lists, both this one, OLAC-Implementers, and the METADATA list (links on the above page). Thanks, Steven Bird From Gary_Simons at SIL.ORG Mon Nov 25 15:11:41 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 25 Nov 2002 09:11:41 -0600 Subject: Archive reviews Message-ID: <MON.25.NOV.2002.091141.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Dear colleagues, For those of you who will be attending the workshop in Philadelphia, our deadline for submission of your archive reviews has now come and gone. Today and tomorrow were the main days we had scheduled for compiling the results. The good news is that we have received reviews from about 40% of you and thus have plenty to get started with. The bad news, however, is that most of you still have not sent something in. Ideally, we would like to get your submissions today, but even if you can't manage that, we still want you to send them in whenever you can since your reviews contain valuable feedback for the archives you are reviewing. See you in two weeks, Gary Simons From hdry at LINGUISTLIST.ORG Sun Nov 3 15:29:40 2002 From: hdry at LINGUISTLIST.ORG (Helen Aristar Dry) Date: Sun, 3 Nov 2002 10:29:40 -0500 Subject: Some comments on the LINGUIST service provider In-Reply-To: <200210030138.g931cmM07394@unagi.cis.upenn.edu> Message-ID: <SUN.3.NOV.2002.102940.0500.> Thanks, very much, Steven. I think you're definitely right on the last two counts and probably on the first. We added the search blank on the first page at your suggestion before LREC, remember? We thought it was a good idea. But it did mean that we now had three levels of search: search blank, simple search, and advanced search. I've been meaning to do exactly what you suggest--collapse simple and advanced--but had let it slip down on my infinitely long to-do list. You know how it is! Thanks for the nudge! I think the first page could indeed be shortened. And your idea of adding a "more about OLAC" link is a good one. But remember that you're coming from a situation where you already know what OLAC is. Most of our readers haven't a clue. So maybe some explanation is really necessary. We'll take another look . . . Again, thanks for the help. It's always useful to have your feedback. Hope OZ is treating you well. We miss having you just down the road.... -Helen Date sent: Wed, 2 Oct 2002 21:38:48 EDT Send reply to: Steven Bird <sb at cs.mu.oz.au> From: Steven Bird <sb at CS.MU.OZ.AU> Organization: University of Melbourne Subject: Some comments on the LINGUIST service provider To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > One of the workshop preparatory tasks is: > > > 3. Review: choose three participating archives besides your own and > > suggest improvements to their use of metadata; review the > > www.language-archives.org site and the www.linguistlist.org/olac/ > > service and suggest improvements. > > I have three low-level comments on the LINGUIST service provider. I hope > this feedback will make the service even better than it already is... > > a) The first page you come to is a long document with a search form some > way down. > > I'd favor a very simple page (cf www.google.com) consisting of a search > box, a link to the advanced search, and a link to "more about OLAC" which > has all the original text. > > b) Users wanting "more powerful search" are directed to the "OLAC Query > page". (Weren't we just on an OLAC query page?) Arriving on this new page, > we see that it is called "OLAC Query Form: Simple Search". This is > confusing, since we've just come from a simple search page expecting the > more powerful search page, only to find that this is still only simple > search. There's no pointer back to the really simple search. > > I'd prefer this to be called "Advanced Search" (both on the title and the > incoming link), with a backpointer to the simple search. > > c) This second page points to yet another page, called Advanced Search. > However, this generates an error: "ODBC Error Code = S1000 (General error) > [TCX][MyODBC]Table 'OLAC.alltypes' doesn't exist". I expect this really > advanced search permits search on all fields. > > I'm not convinced we need three levels of search. Could the second and > third levels be collapsed into a single level, containing all the search > fields? > > Does anyone else have comments on this service? > > -Steven > > -- > Steven Bird Email: <sb at cs.mu.oz.au> Web: http://www.cs.mu.oz.au/~sb/ > A/Prof, Dept of Computer Science, University of Melbourne, Vic 3010, AUSTRALIA > Senior Research Assoc, Linguistic Data Consortium, University of Pennsylvania From hdry at LINGUISTLIST.ORG Sun Nov 3 16:04:30 2002 From: hdry at LINGUISTLIST.ORG (Helen Aristar Dry) Date: Sun, 3 Nov 2002 11:04:30 -0500 Subject: Some comments on the LINGUIST service provider In-Reply-To: <005401c26ac7$142f1020$c800000a@50bneave.net> Message-ID: <SUN.3.NOV.2002.110430.0500.> OOOPs...thanks very much, Baden. We will deal with the broken link asap. About the search suggestion: our "simple search" was intended to do exactly what you describe . . . with the addition of the ability to search by language, which we thought many linguists would want. The addition of a single search blank on the home page has sort of unbalanced the system--as Steven points out, 3 levels of search seem too much. But he suggests having a search blank, plus a full search. I guess I just need to think about whether there's some way to do both what he suggests and what you suggest. I don't see it right now. But I really appreciate the feedback. Thanks for taking the time to make these suggestions. All the best, -Helen Date sent: Thu, 3 Oct 2002 20:24:37 +1000 Send reply to: baden at compuling.net From: Baden Hughes <baden at COMPULING.NET> Organization: CompuLing Subject: Re: Some comments on the LINGUIST service provider To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > >From dealing with some new end users who have been introduced to OLAC > via the Linguist interface, I've got a couple of related comments. > > Users would like to have a simple search - by title, author, description > and subject language. This would mean author would be added to the > existing Quick Search. > > There is a difference between the number of archives actively searched > on the LL site and those registered at the OLAC site. I would have > assumed automated harvesting of the new archives as they are registered > at either location ? > > > An ultra-low level comment, when you click on the link at the bottom of > the LinguistList OLAC page: > > "If you would like to help with the OLAC enterprise, please let us know! > > Thank you in advance for your help! " > > An email message is launched, but there's no email address to send > things to (ie mailto: is malformed). > > Baden From sb at CS.MU.OZ.AU Mon Nov 4 01:52:55 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Sun, 3 Nov 2002 20:52:55 EST Subject: Some comments on the LINGUIST service provider In-Reply-To: Your mail dated Sunday 3 November, 2002. Message-ID: <SUN.3.NOV.2002.205255.EST.SB@CS.MU.OZ.AU> Helen et al, The new search pages on the LINGUIST site are greatly improved! Here are some further suggestions concerning layout and features. I'm sending them to the list to see if others have opinions about any of this. First, I think the search field should precede the descriptive text, so that nothing at all gets in the way of users doing searches (cf search engines). As it is there is already a fixed banner and sidebar that the user has to skip over. Second, the search section is ambiguous. Should the user click on "Search th OLAC catalog" or should they enter keywords and click the search button? I think the search icon could be omitted, since you already have a link to advanced search. Third, it would be helpful to be able to browse the contents of an individual archive. This is akin to the archive selection you can make on the advanced form, but it doesn't require extra search fields and could be made more accessible by putting it on the front page. The interface could parallel the keyword search field: Keyword(s): [TEXT FIELD] (search button) Browse: [ARCHIVE MENU] (browse button) Fourth, in cases where there are several screenfulls of hits, it would be helpful to be able to sort the search results by title, creator, date, subject language, archive, or type. This just requires an "order by" clause on the SQL side, and could be presented as a pulldown menu on the advanced search form and on the search results form. Anyway, there are some ideas I'm throwing out. Thanks for the service you're already providing. Steven Bird From sb at CS.MU.OZ.AU Mon Nov 4 06:47:27 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Mon, 4 Nov 2002 01:47:27 EST Subject: Where to host OLAC schemas Message-ID: <MON.4.NOV.2002.014727.EST.SB@CS.MU.OZ.AU> Recently there was some discussion about hosting the schemas for certain OLAC vocabularies off-site at a location that is directly managed by the author of the schema, mainly because this would be more convenient for the author. After some consideration, Gary and I now believe that these documents should be hosted on the OLAC site. Here are several reasons why: 1. Only mature vocabularies achieve community-wide acceptance, and we anticipate a slow rate of change by the time a vocabulary is approved at the OLAC level. Therefore the updates to the schema will be infrequent. The process of installing a revised schema on the OLAC site would involve a second level of quality control, and this is appropriate for an OLAC tandard that many other systems depend on. 2. Any recommendations and implementation notes pertaining to the vocabulary will already be posted on the OLAC site. New versions of these documents may need to be created when a vocabulary is updated. Transferring an updated schema to the OLAC site does not represent a significant additional burden. 3. When several sites host official OLAC schemas, only one of the sites has to be down in order for validation to be impossible. 4. People can see that a schema is endorsed by OLAC simply from its location, rather than having to inspect the XML schema for the OLAC metadata format. If OLAC vocabularies are ever adopted into other DCMI application profiles then the link to OLAC is maintained. 5. We draw a clear division between OLAC extensions and third-party extensions, when schemas that are approved by the community are put in the OLAC namespace and hosted on the OLAC site. 6. Posting material from many authors on the OLAC site demonstrates that building OLAC is a community effort. Additionally, authors of OLAC vocabularies and the associated schemas are able to demonstrate the impact of their work when they can point to their documents posted on the OLAC site. Convinced? Please let us know what you think. Thanks, Steven Bird From baden at COMPULING.NET Mon Nov 4 09:11:27 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Mon, 4 Nov 2002 19:11:27 +1000 Subject: Updated Schemas for OLAC Language Technology Description Message-ID: <MON.4.NOV.2002.191127.1000.> I've updated several of the experimental schemas I posted at http://www.compuling.net/projects/olac The major update is that the OLAC-Functionality schema now reflects the classifications provided at LT-World (http://wwww.lt-world.org) rather than the earlier ones from the HLT Survey. Comments welcome as usual. Regards Baden From baden at COMPULING.NET Tue Nov 5 06:30:30 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Tue, 5 Nov 2002 16:30:30 +1000 Subject: Where to host OLAC schemas In-Reply-To: <200211040647.gA46lRM12099@unagi.cis.upenn.edu> Message-ID: <TUE.5.NOV.2002.163030.1000.> > 1. Only mature vocabularies achieve community-wide > acceptance, and we anticipate a slow rate of change by the > time a vocabulary is approved at the OLAC level. Therefore > the updates to the schema will be infrequent. The process of > installing a revised schema on the OLAC site would involve a > second level of quality control, and this is appropriate for > an OLAC tandard that many other systems depend on. In essence, it sounds as if you're proposing an editorial review of changes to core schemas. As with similar community based standards initiatives, I think this is a welcome development. Certainly it should enhance the quality of the schemas that do make it into the OLAC standard. The next question that leads from this is "who will do this review" ? Additionally, some form of centralised version control would be advantageous in this context. > 2. Any recommendations and implementation notes pertaining to > the vocabulary will already be posted on the OLAC site. New > versions of these documents may need to be created when a > vocabulary is updated. Transferring an updated schema to the > OLAC site does not represent a significant additional burden. Agreed, especially if there is a documented process by which revisions are proposed, implemented and disseminated. > 3. When several sites host official OLAC schemas, only one of > the sites has to be down in order for validation to be impossible. This is probably the most significant point. If schemas developed by individuals are encouraged it is likely that for various reasons there may be times when the hosting site is unavailable owing to the infrstructure choices made by individuals. Institutions have a greater chance of providing high availability network and hardware solutions to serve this purpose. However ... the same criticism can still be levelled even at this approach - if all the schemas are hosted on the OLAC server, then it also becomes a single point of failure with regard to validation. A better approach would be to have some kind of mirroring arrangement (this is probably beyond the scope of this list) which would ensure that multiple sites held the authoritative version of a schema and could be switched between as necessary. > 4. People can see that a schema is endorsed by OLAC simply > from its location, rather than having to inspect the XML > schema for the OLAC metadata format. If OLAC vocabularies > are ever adopted into other DCMI application profiles then > the link to OLAC is maintained. This is the most important of the points raised IMHO. The fact that interoperability comes for free by using this process is of community wide benefit. > 5. We draw a clear division between OLAC extensions and > third-party extensions, when schemas that are approved by the > community are put in the OLAC namespace and hosted on the OLAC site. > > 6. Posting material from many authors on the OLAC site > demonstrates that building OLAC is a community effort. > Additionally, authors of OLAC vocabularies and the associated > schemas are able to demonstrate the impact of their work when > they can point to their documents posted on the OLAC site. This will serve to encourage innovation and excellence. The as third party schemas are tested and improved, the OLAC process allows for consensus to be formed about the adoption of such schemas into the OLAC namespace. In other words, kudos for writing a good schema implementation. As a third party schema developer, this all sounds reasonable to me. Of course, there's probably a point at which I should be following some "best practice" in terms of schema development and promotion, but I guess we can add that to the list of things to be done :-) Baden From sb at CS.MU.OZ.AU Wed Nov 6 06:37:16 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Wed, 6 Nov 2002 01:37:16 EST Subject: workshop program Message-ID: <WED.6.NOV.2002.013716.EST.SB@CS.MU.OZ.AU> Folks, A draft program is now posted on the workshop website at: http://www.language-archives.org/events/olac02/ We've structured the program to maximize the chances of reaching consensus on the core infrastructure and get to the point where we can launch version 1.0 of the metadata set including a core set of vocabularies. We also want this meeting to set the agenda for 2003. As advertized there is no time for individual conference-style paper presentations. However the meeting will open with a session in which every archive and service will be briefly introduced, and there will be ample informal discussion time for this (2.5 hours of breaks during the day and free evenings). The second half of Wednesday afternoon is scheduled as an open forum for new initiatives of community-wide interest, so please keep the ideas coming. Please let us know if we've left anything out, or if you think we need to adjust time allocations etc. Thanks, Steven Bird From haejoong at UNAGI.CIS.UPENN.EDU Wed Nov 6 18:25:59 2002 From: haejoong at UNAGI.CIS.UPENN.EDU (Haejoong Lee) Date: Wed, 6 Nov 2002 13:25:59 -0500 Subject: OLAC suite updates Message-ID: <WED.6.NOV.2002.132559.0500.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Dear OLAC implementers, New OLAC suite has been released with new features! OLAC suite is a combination of OLAC harvester (Ovester) and aggregator (OLACA) written in perl. Ovester harvests OAI records from OLAC data providers, and stores them in the OLAC MySQL database. OLACA exports the records in the OLAC database using OAI-PMH. They can be used to implement useful services over OLAC archives. Please check the new features below. For more details, please read ChangaLog and README included in the OLAC suite package. New features: Ovester (OLAC harvester) - central db lookup for harvest list: Ovester tries to download harvest list from the remote server (language-archives web server) by default. - database synchronization: The archives in the OLAC database are synchronized to those in harvest list, i.e. the archives that are not listed in the harvest list are cleaned up in the database. - use of seperate database account information file: The database account information is not hardcoded in the Ovester code, but kept in a seperate file for security. - The features listed above are controlled by command line options. For example, you can turn on/off the synchronization using -p optiuon. OLACA (OLAC Aggregator) - Query verb: A new verb, "Query" is added. Please see http://www.language-archives.org/NOTE/query.html - OAI-PHM 2.0: OLAC suite 2 contains OAI-PHM 2.0 compliant OLACA. Downloads: OLAC suite (Ovester:OAI-PMH 1.1, OLACA:OAI-PMH 1.1): http://www.language-archives.org/tools/olac_suite.tgz OLAC suite 2 (Ovester:OAI-PMH 1.1, OLACA:OAI-PMH 2.0): http://www.language-archives.org/tools/olaca_suite2.tgz Thanks, Haejoong From Gary_Simons at SIL.ORG Fri Nov 8 17:51:36 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 8 Nov 2002 11:51:36 -0600 Subject: Peer review of archives in preparation for workshop Message-ID: <FRI.8.NOV.2002.115136.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Dear workshop participants, You will recall that the third goal of our upcoming workshop in December, as stated in the Workshop Overview on the web site, is: 3. Review: To give feedback to each participating archive on its use of metadata, to review the services on the OLAC and LINGUIST sites. We have also warned you that we wanted each participant to do some preparatory tasks prior to the workshop, including reviewing metadata from three archives besides your own. Joan Spanne, the archivist for SIL International, has agreed to help us by collating the results of these individual archive reviews and to make a presentation on the "State of the Archives" at the workshop. In addition to the benefit to each archive of getting constructive peer review, we anticipate that another key outcome will be improvements to our metadata guidelines and identification of more best practice recommendations. In order to facilitate this review process, we have worked with Joan to develop a peer review form, which is attached. We have also worked out specific review assignments. Each workshop participant has been assigned to review specific archives. Consult the following web page to see which archives you have been assigned to: http://www.language-archives.org/events/olac02/reviews.html The full instructions on how to perform the review are given in the attached review form (which is also accessible via a link at the top of the web page just mentioned). This should not be a time consuming process. We anticipate that a single review can be completed within 30 minutes. You may also need to spend some time familiarizing yourself again with the relevant OLAC standards. Links to these are given in the detailed instructions. The reviews for each archive will be collated and sent to the contact person for the archive as anonymous reviews. Of course, the web page of review assignments gives some clue as to who reviewers might be, but it will be impossible to know exactly who said what, so we trust there will be an adequate level of anonymity. The actual anonymity will be increased by the fact that there will often be reviewers in addition to the ones named on the assignment page. After the due date, we will ask some of you who have shown a knack for this sort of review to fill in some gaps left by reviews that may not have come in. You are also encouraged to submit reviews of any additional archives you please at your own initiative. The deadline for submission of completed reviews is two weeks from today, FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! Address completed reviews to: joan_spanne at sil.org, olac-admin at language-archives.org We look forward to good feedback from all of you. Don't hesitate to contact us if you have any questions. Best, Gary Simons (and Steven Bird) (See attached file: review-form.txt) (See attached file: review-form.txt) -------------- next part -------------- A non-text attachment was scrubbed... Name: review-form.txt Type: application/octet-stream Size: 3457 bytes Desc: not available URL: <http://listserv.linguistlist.org/pipermail/olac-implementers/attachments/20021108/da956b38/attachment.obj> From Alexis.Dimitriadis at LET.UU.NL Sat Nov 9 10:36:53 2002 From: Alexis.Dimitriadis at LET.UU.NL (Dimitriadis, Alexis) Date: Sat, 9 Nov 2002 11:36:53 +0100 Subject: Peer review of archives in preparation for workshop Message-ID: <SAT.9.NOV.2002.113653.0100.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Hi, I just looked at the archive list, and I do not see my name as a reviewer. Perhaps this is because, as I discovered just yesterday, I was not subscribed to the OLAC-IMPLEMENTERS list. (There was enough material coming in from OLAC-METADATA that I did not realize I was missing something important). I will be attending the workshop as a representative of the TDS/LTRC group in the Netherlands, and would very much like to do my part! Alexis _____________________________________________ Alexis Dimitriadis alexis.dimitriadis at let.uu.nl +31-30-253-6219 Utrecht Institute of Linguistics OTS Trans 10 3512 JK Utrecht The Netherlands -----Original Message----- From: Gary Simons [mailto:Gary_Simons at SIL.ORG] Sent: Friday, 08 November, 2002 18:52 To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG Subject: Peer review of archives in preparation for workshop Dear workshop participants, You will recall that the third goal of our upcoming workshop in December, as stated in the Workshop Overview on the web site, is: 3. Review: To give feedback to each participating archive on its use of metadata, to review the services on the OLAC and LINGUIST sites. We have also warned you that we wanted each participant to do some preparatory tasks prior to the workshop, including reviewing metadata from three archives besides your own. Joan Spanne, the archivist for SIL International, has agreed to help us by collating the results of these individual archive reviews and to make a presentation on the "State of the Archives" at the workshop. In addition to the benefit to each archive of getting constructive peer review, we anticipate that another key outcome will be improvements to our metadata guidelines and identification of more best practice recommendations. In order to facilitate this review process, we have worked with Joan to develop a peer review form, which is attached. We have also worked out specific review assignments. Each workshop participant has been assigned to review specific archives. Consult the following web page to see which archives you have been assigned to: http://www.language-archives.org/events/olac02/reviews.html The full instructions on how to perform the review are given in the attached review form (which is also accessible via a link at the top of the web page just mentioned). This should not be a time consuming process. We anticipate that a single review can be completed within 30 minutes. You may also need to spend some time familiarizing yourself again with the relevant OLAC standards. Links to these are given in the detailed instructions. The reviews for each archive will be collated and sent to the contact person for the archive as anonymous reviews. Of course, the web page of review assignments gives some clue as to who reviewers might be, but it will be impossible to know exactly who said what, so we trust there will be an adequate level of anonymity. The actual anonymity will be increased by the fact that there will often be reviewers in addition to the ones named on the assignment page. After the due date, we will ask some of you who have shown a knack for this sort of review to fill in some gaps left by reviews that may not have come in. You are also encouraged to submit reviews of any additional archives you please at your own initiative. The deadline for submission of completed reviews is two weeks from today, FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! Address completed reviews to: joan_spanne at sil.org, olac-admin at language-archives.org We look forward to good feedback from all of you. Don't hesitate to contact us if you have any questions. Best, Gary Simons (and Steven Bird) (See attached file: review-form.txt) (See attached file: review-form.txt) From Alexis.Dimitriadis at LET.UU.NL Sat Nov 9 10:49:11 2002 From: Alexis.Dimitriadis at LET.UU.NL (Dimitriadis, Alexis) Date: Sat, 9 Nov 2002 11:49:11 +0100 Subject: Oops! Message-ID: <SAT.9.NOV.2002.114911.0100.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> My apologies, I did not intend to send the last message to the entire list! Alexis -----Original Message----- From: Dimitriadis, Alexis [mailto:Alexis.Dimitriadis at let.uu.nl] Sent: Saturday, 09 November, 2002 11:37 To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG Subject: Re: Peer review of archives in preparation for workshop ... From sb at CS.MU.OZ.AU Sun Nov 10 23:52:12 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Sun, 10 Nov 2002 18:52:12 EST Subject: New report on OLAC infrastructure Message-ID: <SUN.10.NOV.2002.185212.EST.SB@CS.MU.OZ.AU> Folks, Gary Simons and I have recently written a 12-page report containing a comprehensive and up-to-date overview of OLAC technical infrastructure. The final draft, now under review, is posted at: http://www.language-archives.org/docs/lht-draft.pdf Although this paper is directed at a wider digital libraries audience, we hope it will help workshop participants prepare for our technical discussions. Comments welcomed. Steven Bird From churen at GATE.SINICA.EDU.TW Tue Nov 12 07:49:08 2002 From: churen at GATE.SINICA.EDU.TW (Chu-Ren Huang) Date: Tue, 12 Nov 2002 02:49:08 EST Subject: Suggestion for adding a Proofreader role Message-ID: <TUE.12.NOV.2002.024908.EST.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Dear All: We would like to suggest the addition of a Proofreader role to OLACMC. This suggestion was originally made in a paper (Ru-Yng Chang and Chu-Ren Huang. 2002. OLACMS: Comparisons and Applications in Chinese and Formosan Languages. Proceedings of The 3rd Workshop on Asian Language Resources and International Standardization, A Post-COLING2002 workshop.) Since all digitized material must be proofread to ensure quality, it is important to know who the proofreader is. A proofreader should be able to refer to an individual, a team, or a standard procedure. This is very crucial for heritage data, since a competent proofreader may require some reading skill of a language that is no longer in use. Take classical Chinese text (or Latin) for example. To construct an archive based on classical Chinese texts, input and proofreading cannot be avoided. The standard procedure here at Academia Sinica involves inputting the text twice (manually or automatically, but by different teames), run an automatic proofreading program on the two versions, then use human proofreader to go over the differences for at least three more runs (while proofreading the identical parts quickly.) Such a standard procedure can ensure data integrety and accurracy to a very high standard (less than 1% of error remains.) The mor traditional way require 7 runs of proofreading, with one proofreader reading the text backwards [to avoide automatic self-correction in reading)> In other words, a text proofread by the AS team should be highly reliable. While a text archived by a Chinese graduate student, input and proofread by him/herself may not reach the same level. Chu-Ren From sb at CS.MU.OZ.AU Tue Nov 12 08:04:00 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Tue, 12 Nov 2002 03:04:00 EST Subject: Suggestion for adding a Proofreader role In-Reply-To: Your mail dated Tuesday 12 November, 2002. Message-ID: <TUE.12.NOV.2002.030400.EST.SB@CS.MU.OZ.AU> I forwarded this to the METADATA list - please respond to this posting there. http://lists.linguistlist.org/archives/metadata.html -Steven Bird From martin.wynne at OTA.AHDS.AC.UK Wed Nov 13 16:39:35 2002 From: martin.wynne at OTA.AHDS.AC.UK (Martin Wynne) Date: Wed, 13 Nov 2002 16:39:35 -0000 Subject: A simpler format for OLAC vocabularies and schemes Message-ID: <WED.13.NOV.2002.163935.0000.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Gary, Sorry about my long response time, but I'm just catching up on mail from the last month or so on this list in preparation for the workshop. The following is with reference to Steve and Gary's posings from 31st October. If I understand correctly, the proposal is to move from: <title>Orginal title Translated title to: Original title Translated title I've checked with the relevant DCMI website and this does indeed seem to be in conformance with their recommendations. Now perhaps in that case I should take this up with the DCMI and not with OLAC... What I can't see is why the element is neither embedded in the element, nor identified as a type of title element (as in OLAC 0.4). To put it bluntly, a human can't see what it is an alternative type of, since it is not tagged in any obvious way as a title. Does this require that "alternative" be defined (somwhere?) as a type of title? And in this case, shouldn't it be called "alternativeTitle" or something more transparent? Up to this point I am heartily in accordance with the suggestions for simplifying the formats. I agree that syntactic conformance with DC is a good thing, but they now seem to be going down a road which aims to flatten out any hierarchical organisation of the data classification, and makes human readability of the XML impossible. Further apologies if I've got the wrong end of the stick here by coming in belatedly to the discussion. Best, Martin From Gary_Simons at SIL.ORG Thu Nov 14 03:24:41 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Wed, 13 Nov 2002 21:24:41 -0600 Subject: A simpler format for OLAC vocabularies and schemes Message-ID: <WED.13.NOV.2002.212441.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Martin, This is a good question. Let me take a stab at answering: >If I understand correctly, the proposal is to move from: > > <title>Orginal title > Translated title > >to: > > Original title > Translated title > >I've checked with the relevant DCMI website and this does indeed seem to be >in conformance with their recommendations. Now perhaps in that case I should >take this up with the DCMI and not with OLAC... What I can't see is why the > element is neither embedded in the element, nor identified >as a type of title element (as in OLAC 0.4). First, let me make sure everyone understands what dcterms is. It is the namespace for all of the refinements defined in the DC Qualifiers recommendation. Thus, there is also: <dcterms:hasPart>A qualified Relation</dcterms:hasPart> <dcterms:temporal>A qualified Coverage</dcterms:temporal> and so on > To put it bluntly, a human >can't see what it is an alternative type of, since it is not tagged in any >obvious way as a title. Does this require that "alternative" be defined >(somwhere?) as a type of title? And in this case, shouldn't it be called >"alternativeTitle" or something more transparent? It is quite true that the XML file gives no clue as to the corresponding non-qualified element. However, there is no ambiguity since each DC refinement is defined to occur with only one DC element. That is, <alternative> is defined to be a refinement of Title and nothing else, <temporal> of Coverage, and so on. The mapping problem is solved in implementation by adding a table of refinement to non-qualified element pairs to the harvested metadata database. This allows a service provider to "dumb down" the tags in the dcterms namespace to their dc equivalents. The standard OLAC harvester will have this built in. >Up to this point I am heartily in accordance with the suggestions for >simplifying the formats. I agree that syntactic conformance with DC is a >good thing, but they now seem to be going down a road which aims to flatten >out any hierarchical organisation of the data classification, and makes >human readability of the XML impossible. Note that the hierarchical organisation is in the classification scheme, and not in the data itself. That is why it is appropriate for data encoding to be "flattened". There only needs to be one instance of the classification hierarchy (e.g. the database table I mention above, or the DCMI's RDF schema for dcterms), and a flattened tag can be looked up in that hierarchy rather than repeating the refinement-to-element mapping in every instance of the refinement. >Further apologies if I've got the wrong end of the stick here by coming in >belatedly to the discussion. I hope that makes sense. -Gary From martin.wynne at OTA.AHDS.AC.UK Fri Nov 15 13:58:44 2002 From: martin.wynne at OTA.AHDS.AC.UK (Martin Wynne) Date: Fri, 15 Nov 2002 13:58:44 -0000 Subject: OLAC DTD Message-ID: <FRI.15.NOV.2002.135844.0000.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Where can I find a copy of the file olacrep.dtd? From Gary_Simons at SIL.ORG Fri Nov 15 14:57:13 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Fri, 15 Nov 2002 08:57:13 -0600 Subject: OLAC DTD Message-ID: <FRI.15.NOV.2002.085713.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> On 11/15/2002 07:58:44 AM Martin Wynne wrote: >Where can I find a copy of the file olacrep.dtd? That sounds like an early name of the DTD for an OLAC repository in XML. That is now replaced by an XML schema: http://www.language-archives.org/OLAC/0.4/oryx.xsd If you really do need the historical artifact, it appears to still be posted on the site (with a version date of 28 Jun 2001) at: http://www.language-archives.org/tools/xsl/olacrep.dtd -Gary Simons From sb at CS.MU.OZ.AU Thu Nov 21 07:17:00 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Thu, 21 Nov 2002 02:17:00 EST Subject: Peer review of archives in preparation for workshop In-Reply-To: Your mail dated Friday 8 November, 2002. Message-ID: <THU.21.NOV.2002.021700.EST.SB@CS.MU.OZ.AU> Folks, Please note that the archive reviews are due this Friday, 22 November. For information on the reviewing assignments, please see: http://www.language-archives.org/events/olac02/reviews.html The original announcement follows. Thanks, Steven Bird > Dear workshop participants, > > You will recall that the third goal of our upcoming workshop in December, > as stated in the Workshop Overview on the web site, is: > > 3. Review: To give feedback to each participating archive on its use of > metadata, to review the services on the OLAC and LINGUIST sites. > > We have also warned you that we wanted each participant to do some > preparatory tasks prior to the workshop, including reviewing metadata from > three archives besides your own. > > Joan Spanne, the archivist for SIL International, has agreed to help us by > collating the results of these individual archive reviews and to make a > presentation on the "State of the Archives" at the workshop. In addition > to the benefit to each archive of getting constructive peer review, we > anticipate that another key outcome will be improvements to our metadata > guidelines and identification of more best practice recommendations. > > In order to facilitate this review process, we have worked with Joan to > develop a peer review form, which is attached. We have also worked out > specific review assignments. Each workshop participant has been assigned > to review specific archives. Consult the following web page to see which > archives you have been assigned to: > > http://www.language-archives.org/events/olac02/reviews.html > > The full instructions on how to perform the review are given in the > attached review form (which is also accessible via a link at the top of the > web page just mentioned). This should not be a time consuming process. We > anticipate that a single review can be completed within 30 minutes. You may > also need to spend some time familiarizing yourself again with the relevant > OLAC standards. Links to these are given in the detailed instructions. > > The reviews for each archive will be collated and sent to the contact > person for the archive as anonymous reviews. Of course, the web page of > review assignments gives some clue as to who reviewers might be, but it > will be impossible to know exactly who said what, so we trust there will be > an adequate level of anonymity. The actual anonymity will be increased by > the fact that there will often be reviewers in addition to the ones named > on the assignment page. After the due date, we will ask some of you who > have shown a knack for this sort of review to fill in some gaps left by > reviews that may not have come in. You are also encouraged to submit > reviews of any additional archives you please at your own initiative. > > The deadline for submission of completed reviews is two weeks from today, > FRIDAY, 22 NOVEMBER 2002. And early returns will be appreciated, too! > Address completed reviews to: > > joan_spanne at sil.org, olac-admin at language-archives.org > > We look forward to good feedback from all of you. Don't hesitate to > contact us if you have any questions. > > Best, > > Gary Simons (and Steven Bird) From sb at CS.MU.OZ.AU Fri Nov 22 08:31:13 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Fri, 22 Nov 2002 03:31:13 EST Subject: Vida for OAI-PMH 2.0 Message-ID: <FRI.22.NOV.2002.033113.EST.SB@CS.MU.OZ.AU> Folks, The current version of Vida, http://www.language-archives.org/vida, implements version 1.1 of the OAI protocol. I have created a beta version of Vida2 that implements version 2.0 of the protocol. It is not a full implementation since it does not generate all the error responses. However, it should be enough for people who want to expose their OLAC XML files to current OAI harvesters. Please see: http://www.language-archives.org/vida2 Note that the OAI is developing their own, general version of Vida, to be made available in December. We may be able to use that instead of our own vida2, and avoid the trouble of tracking future changes to the protocol. -Steven Bird From ruyng at GATE.SINICA.EDU.TW Fri Nov 22 11:49:03 2002 From: ruyng at GATE.SINICA.EDU.TW (Ru-Yng Chang) Date: Fri, 22 Nov 2002 06:49:03 -0500 Subject: experimental schema:type.functionality Message-ID: <FRI.22.NOV.2002.064903.0500.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> There are some different from the Application(s) of LDC. message understanding. pronunciation modeling. prosody. speaker identification. speaker verification. topic detection and tracking. I'm not sure whether appropriate. ruyng From baden at COMPULING.NET Fri Nov 22 15:17:21 2002 From: baden at COMPULING.NET (Baden Hughes) Date: Sat, 23 Nov 2002 01:17:21 +1000 Subject: experimental schema:type.functionality In-Reply-To: <OLAC-IMPLEMENTERS%2002112206490408@LISTSERV.LINGUISTLIST.ORG> Message-ID: <SAT.23.NOV.2002.011721.1000.> Hi Ru-Yng Chang Thanks for your comments. In the new version of OLAC-Functionality available at http://www.compuling.net/projects/olac/ the inclusion of these types is mostly completed by the use of the HLT Survey categories (document in preparation. Regards Baden > -----Original Message----- > From: OLAC Implementers List > [mailto:OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG] On > Behalf Of Ru-Yng Chang > Sent: Friday, 22 November 2002 21:49 > To: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG > Subject: Re: experimental schema:type.functionality > > > There are some different from the Application(s) of LDC. > > message understanding. > pronunciation modeling. > prosody. > speaker identification. > speaker verification. > topic detection and tracking. > > I'm not sure whether appropriate. > > ruyng > From sb at CS.MU.OZ.AU Fri Nov 22 22:02:36 2002 From: sb at CS.MU.OZ.AU (Steven Bird) Date: Fri, 22 Nov 2002 17:02:36 EST Subject: Workshop preparation Message-ID: <FRI.22.NOV.2002.170236.EST.SB@CS.MU.OZ.AU> Folks, Please keep the archive reviews coming. They are providing a valuable and timely critique of our archives and our infrastructure, and will help us make well-informed decisions at the workshop. We'll take late ones, but the sooner the better of course... There are many other preparation activities, such as reviewing the new controlled vocabulary documents and testing the vocabularies on your archives. A list of these activities is posted at: http://www.language-archives.org/events/olac02/preparation.html People who won't be attending the meeting are particularly encouraged to make your voices heard on the mailing lists, both this one, OLAC-Implementers, and the METADATA list (links on the above page). Thanks, Steven Bird From Gary_Simons at SIL.ORG Mon Nov 25 15:11:41 2002 From: Gary_Simons at SIL.ORG (Gary Simons) Date: Mon, 25 Nov 2002 09:11:41 -0600 Subject: Archive reviews Message-ID: <MON.25.NOV.2002.091141.0600.OLACIMPLEMENTERS@LISTSERV.LINGUISTLIST.ORG> Dear colleagues, For those of you who will be attending the workshop in Philadelphia, our deadline for submission of your archive reviews has now come and gone. Today and tomorrow were the main days we had scheduled for compiling the results. The good news is that we have received reviews from about 40% of you and thus have plenty to get started with. The bad news, however, is that most of you still have not sent something in. Ideally, we would like to get your submissions today, but even if you can't manage that, we still want you to send them in whenever you can since your reviews contain valuable feedback for the archives you are reviewing. See you in two weeks, Gary Simons