[Lexicog] Collaborative lexicography software?

maxwell at LDC.UPENN.EDU maxwell at LDC.UPENN.EDU
Fri May 2 18:55:11 UTC 2008


Quoting Heather Souter <hsouter at gmail.com>:
> I, too, am very interested in learning about dictionary development
> for languages with complex morphologies.  ...
> Any insight into how to create dictionaries that are useful to
> speakers and learners and not only language specialists would be
> especially welcomed!

One "solution" (quote marks explained at the end of this msg) is to 
give people a computer program that allows them to look up words 
regardless of the inflected form that they type in.  For the simple 
cases, this can often be done by just looking for a substring of the 
typed-in word.  For a purely suffixing language, the substring would 
begin at the first letter of the typed-in word.

Of course, the simple cases are not the ones where people need the most 
help.  The complex cases--where there is prefixing (or worse, both 
prefixing and suffixing), or infixing, or reduplication, or lots of 
stem allomorphy--are the ones where people need help, and where the 
simple solutions don't work.  For these morphologically complex 
languages, there needs to be a morphological parser between the user 
and the electronic dictionary per se.  The parser's job is to remove 
all the suffixes, undo any stem allomorphy, convert the stem into a 
dictionary citation form, and finally look up the citation form in the 
actual dictionary.

One project that is building such tools in a generic fashion (i.e. in a 
way that should be portable to more languages, as opposed to a 
proprietary way that just works for French, say), is a Department of 
Education funded project at the Linguistic Data Consortium (LDC).  
There's an example of how this works (for Arabic) at 
http://projects.ldc.upenn.edu/art/reader/source/Al-Kitaab.01.  In this 
case, the lookup is limited to the text shown there, but a simple 
modification would allow the user to type in words to be looked up.  
The project is also demonstrating lookup with the same tool on (a 
dialect of) Nahuatl, a morphologically complex language of Mexico.  
(Disclaimer: I'm a consultant on this project, hence biased :-).)

There are of course other reasons (besides morphology) that make it 
hard for people to look up words in dictionaries, such as spelling.  
One can imagine inserting a spell corrector between the user and the 
electronic dictionary.  For morphologically complex languages, such a 
spell corrector will almost certainly have to be based off of a 
morphological parser.

And of course my whole long-winded answer presupposes that electronic 
dictionaries (and the computers that they run on) are a reasonable 
solution for the language speakers.  For speakers of languages in 
California, that's probably true; for speakers in the Amazon, that may 
not be a solution at all.

   Mike Maxwell
   CASL/ U MD

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list