[Lexicog] Re: Citation forms in Prefixing Languages

Sat Mar 20 16:16:15 UTC 2004

I have been thinking about athabaskan (specifically navajo)  dictionaries for awhile and had
the following thoughts:

1. Using a standard xml format for texts and dictionary pages and putting them into a
native xml database. This is a database where one just puts in well-formed or valid xml
documents and it does all the indexing etc. and supports a query language such as xquery
or xpath.

2. Using the young and morgan (YM)  'analytical lexicon' root-based approach for the
dictionary pages.

3. Using a hypertext method of lookup where the user would highlight the stem of any
verb in a text (in their web browser) and that would create an xquery to the dictionary
looking for the stem. Once the stem is found, all the applicable roots  (usually one or very
few) are displayed for selection and this results in a link to the root page in the dictionary
which has all the prefix and other info for the stem.

This approach seems to have the following advantages:

1. No real programming or complex database setup is involved to set it up once the texts
and dictionary are in xml format. They are dumped into the native xml database (several
open source ones are available) and are accessible by xquery. This makes for relatively
easy setup and maintenance since it is just document maintenance for both text and
lexicon.

2. Xquery, of course,  is not an easy thing for most users (myself included), but the
standard lookup queries would be created for them. These standard lookups would be
implemented as hypertext links in the texts using some sort of server-side active pages
deal like jsp or mason etc. This page would form a template that any text from the xml
database would be loaded into.

3. It allows creation of a web-based textbase system with lexical tools for schools etc.

And the following problems:

1. The user still has to find and highlight the stem. Poser has suggested that a
morphological anlyser is needed and this would be good but would probably be brittle (at
least if I made one!). Can users find the stem easily?

2. Lacher mentioned that navajo users liked the full head-word version of YM. My own
experience (which is probably much more limited than his) was similar that people liked
the head-word better but were not especially more successful with it. It was just more
comforting to see a whole word if they indeed did find it. So I am advocating an analytical
approach, but perhaps incorrectly.

3. There are still a bunch of nagging little surface form problems that would have to be
dealt with even with this manual 'user selection of the stem' routine (both stem initial and
final). For example, (using the TNR Navajo font number scheme) the word da'iid33' gets a
falling tone when affixed as in da'iid32'go (where 3 is hightone/nasal 'a' and 2 is nasal 'a')-
this means that the user would be highlighting d32' when the dictionary form of the stem
is d33'. My impression is that there are not too many of these and it is practical to put
them into the lexicon documents, but I may be wrong on this.

I would appreciate hearing about any experiences with these issues or comments.

kip canfield
umbc

--- In lexicographylist at yahoogroups.com, "William Poser" <billposer at a...> wrote:
> The link I gave to "Making Athabaskan Dictionaries Usable" in
> my previous message was garbled. Let me try again:
>
> http://www.cis.upenn.edu/~wjposer/cgi-bin/load.cgi?http://www.cis.upenn.edu/
~wjposer/.downloads/makath.pdf

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/