computational tool for exploration of language dictionaries

Wed Jan 8 20:07:53 UTC 2003

An interesting project presenting lexical databases using XML.
As of August 2002, the Nahuatl version (Kitlkitl) is in alpha.

http://www-nlp.stanford.edu/kirrkirr/

Kirrkirr:
a computational tool for the exploration of indigenous language dictionaries

Kirrkirr is a research project exploring the use of computer software
for automatic transformation of lexical databases ("dictionaries"),
aiming at providing innovative information visualization, particularly
targeted at indigenous languages. As a first example, it can generate
networks of words automatically from dictionary data.

The central idea motivating our research is that given any sort of
well-structured lexical database, software should be able to automatically
provide all sorts of value-added functionality. In recent years, there
has been an enormous amount of work on different proposals for structuring
and storing lexical databases, but almost no work on providing electronic
dictionary interfaces which make use of this structure to provide
human access and usability through information transformation and
visualization. Kirrkirr explores ways of solving this unaddressed need.

Technical details:

Kirrkirr is designed so that it can work with any dictionary in XML format
(XML is a new-ish, but already widespread standard for representing textual
and other data, especially on the WWW). Most of our initial experience and
papers concern applying the dictionary to Warlpiri, an Indigenous Australian
language, but lately we've been building a version for Nahuatl, an Indigenous
language of Mexico. It achieves this flexibility through use of a dictionary
specification file (also in XML, mainly using XPath) which maps dictionary
constructs to Kirrkirr constructs. Such a file does have to be written for
each dictionary schema. Formatted entries are rendered using parameterized
XSLT files, which can be customized for each dictionary schema.
Other dictionary access is by XPath expressions accompanied by regular
expression matching. The program is written in Java. Where possible we run it
using current Java versions, but it is compatible with JDK1.1.8+Swing1.1,
so that we can run it on MacOS 8 or 9 (still common in Australian schools!).