OLAC Web-Crawler Gateway

Steven Bird sb at UNAGI.CIS.UPENN.EDU
Thu Jul 4 01:05:15 UTC 2002


ANNOUNCING THE OLAC WEB-CRAWLER GATEWAY

The OLAC Web-Crawler Gateway is an OLAC service provider which
contains the metadata from all the other 20+ registered OLAC archives.
The web-crawler gateway exports each OLAC metadata record as an HTML
document, permitting it to be indexed by web search engines.
End-users looking for language resources using conventional search
engines will fortuitously arrive at OLAC records.

Language archives whose content is accessible only via a search form
cannot be indexed by search engines.  By participating in OLAC, their
content becomes accessible to people who use OLAC Service Providers,
and now also to people using conventional web search engines.

Here is the HTML presentation of the OLAC record oai:ldc:LDC93S1.  It
is what an end-user would see just one click from the web search engine:

  http://www.language-archives.org/tools/lookup.php4?identifier=oai:ldc:LDC93S1

Here is what the web search engine actually crawls:

  http://www.language-archives.org:8082/dp9/

The web-crawler gateway is based on three existing pieces of software:

1. DP9, the OAI Web-Crawler Gateway, developed by Xiaoming Liu at Old
   Dominion University [http://arc.cs.odu.edu:8080/dp9/].  The OLAC
   Web-Crawler Gateway is a port of DP9, hardcoded to serve OLAC records.

2. OLACA, the OLAC Aggregator, announced on this list last month
   [http://lists.linguistlist.org/cgi-bin/wa?A2=ind0206&L=olac-implementers]
   This is a special data provider which serves up all the records found
   in OLAC archives.

3. The OLAC Resolver, which maps the OAI identifer of an OLAC record
   to an HTML view of the record.  This service is accessible from the
   OLAC tools page [http://www.language-archives.org/tools.html]

The OLAC Web-Crawler Gateway was developed by Haejoong Lee at the
Linguistic Data Consortium, funded by the National Science Foundation.
We gratefully acknowledge the support of Xiaoming Liu.

Steven Bird

--
Steven.Bird at ldc.upenn.edu  http://www.ldc.upenn.edu/sb
Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics
Linguistic Data Consortium, University of Pennsylvania
3615 Market St, Suite 200, Philadelphia, PA 19104-2608



More information about the Olac-implementers mailing list