From sb at UNAGI.CIS.UPENN.EDU Thu Jul 4 01:05:15 2002 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Wed, 3 Jul 2002 21:05:15 EDT Subject: OLAC Web-Crawler Gateway Message-ID: ANNOUNCING THE OLAC WEB-CRAWLER GATEWAY The OLAC Web-Crawler Gateway is an OLAC service provider which contains the metadata from all the other 20+ registered OLAC archives. The web-crawler gateway exports each OLAC metadata record as an HTML document, permitting it to be indexed by web search engines. End-users looking for language resources using conventional search engines will fortuitously arrive at OLAC records. Language archives whose content is accessible only via a search form cannot be indexed by search engines. By participating in OLAC, their content becomes accessible to people who use OLAC Service Providers, and now also to people using conventional web search engines. Here is the HTML presentation of the OLAC record oai:ldc:LDC93S1. It is what an end-user would see just one click from the web search engine: http://www.language-archives.org/tools/lookup.php4?identifier=oai:ldc:LDC93S1 Here is what the web search engine actually crawls: http://www.language-archives.org:8082/dp9/ The web-crawler gateway is based on three existing pieces of software: 1. DP9, the OAI Web-Crawler Gateway, developed by Xiaoming Liu at Old Dominion University [http://arc.cs.odu.edu:8080/dp9/]. The OLAC Web-Crawler Gateway is a port of DP9, hardcoded to serve OLAC records. 2. OLACA, the OLAC Aggregator, announced on this list last month [http://lists.linguistlist.org/cgi-bin/wa?A2=ind0206&L=olac-implementers] This is a special data provider which serves up all the records found in OLAC archives. 3. The OLAC Resolver, which maps the OAI identifer of an OLAC record to an HTML view of the record. This service is accessible from the OLAC tools page [http://www.language-archives.org/tools.html] The OLAC Web-Crawler Gateway was developed by Haejoong Lee at the Linguistic Data Consortium, funded by the National Science Foundation. We gratefully acknowledge the support of Xiaoming Liu. Steven Bird -- Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608 From sb at UNAGI.CIS.UPENN.EDU Thu Jul 4 01:05:15 2002 From: sb at UNAGI.CIS.UPENN.EDU (Steven Bird) Date: Wed, 3 Jul 2002 21:05:15 EDT Subject: OLAC Web-Crawler Gateway Message-ID: ANNOUNCING THE OLAC WEB-CRAWLER GATEWAY The OLAC Web-Crawler Gateway is an OLAC service provider which contains the metadata from all the other 20+ registered OLAC archives. The web-crawler gateway exports each OLAC metadata record as an HTML document, permitting it to be indexed by web search engines. End-users looking for language resources using conventional search engines will fortuitously arrive at OLAC records. Language archives whose content is accessible only via a search form cannot be indexed by search engines. By participating in OLAC, their content becomes accessible to people who use OLAC Service Providers, and now also to people using conventional web search engines. Here is the HTML presentation of the OLAC record oai:ldc:LDC93S1. It is what an end-user would see just one click from the web search engine: http://www.language-archives.org/tools/lookup.php4?identifier=oai:ldc:LDC93S1 Here is what the web search engine actually crawls: http://www.language-archives.org:8082/dp9/ The web-crawler gateway is based on three existing pieces of software: 1. DP9, the OAI Web-Crawler Gateway, developed by Xiaoming Liu at Old Dominion University [http://arc.cs.odu.edu:8080/dp9/]. The OLAC Web-Crawler Gateway is a port of DP9, hardcoded to serve OLAC records. 2. OLACA, the OLAC Aggregator, announced on this list last month [http://lists.linguistlist.org/cgi-bin/wa?A2=ind0206&L=olac-implementers] This is a special data provider which serves up all the records found in OLAC archives. 3. The OLAC Resolver, which maps the OAI identifer of an OLAC record to an HTML view of the record. This service is accessible from the OLAC tools page [http://www.language-archives.org/tools.html] The OLAC Web-Crawler Gateway was developed by Haejoong Lee at the Linguistic Data Consortium, funded by the National Science Foundation. We gratefully acknowledge the support of Xiaoming Liu. Steven Bird -- Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608