[Lexicog] Collaborative lexicography software?

Sun May 4 18:44:13 UTC 2008

Dear Mike:

You give an empty form in your example of how this works (for Arabic) at 
 http://projects.ldc.upenn.edu/art/reader/source/Al-Kitaab.01.  Can you send several actual examples of Arabic lexemes from this database, e.g. ta`arif or
or suhuniyyat(un). It is difficult to understand the structure of the database without actual lexemes. I also do not know whether the examples will be
displayed in the readable form in the email.

Hayim Sheynin

maxwell at ldc.upenn.edu wrote:                             Quoting Heather Souter <hsouter at gmail.com>:
 > I, too, am very interested in learning about dictionary development
 > for languages with complex morphologies.  ...
 > Any insight into how to create dictionaries that are useful to
 > speakers and learners and not only language specialists would be
 > especially welcomed!

 One "solution" (quote marks explained at the end of this msg) is to 
 give people a computer program that allows them to look up words 
 regardless of the inflected form that they type in.  For the simple 
 cases, this can often be done by just looking for a substring of the 
 typed-in word.  For a purely suffixing language, the substring would 
 begin at the first letter of the typed-in word.

 Of course, the simple cases are not the ones where people need the most 
 help.  The complex cases--where there is prefixing (or worse, both 
 prefixing and suffixing), or infixing, or reduplication, or lots of 
 stem allomorphy--are the ones where people need help, and where the 
 simple solutions don't work.  For these morphologically complex 
 languages, there needs to be a morphological parser between the user 
 and the electronic dictionary per se.  The parser's job is to remove 
 all the suffixes, undo any stem allomorphy, convert the stem into a 
 dictionary citation form, and finally look up the citation form in the 
 actual dictionary.

 One project that is building such tools in a generic fashion (i.e. in a 
 way that should be portable to more languages, as opposed to a 
 proprietary way that just works for French, say), is a Department of 
 Education funded project at the Linguistic Data Consortium (LDC).  
 There's an example of how this works (for Arabic) at 
 http://projects.ldc.upenn.edu/art/reader/source/Al-Kitaab.01.  In this 
 case, the lookup is limited to the text shown there, but a simple 
 modification would allow the user to type in words to be looked up.  
 The project is also demonstrating lookup with the same tool on (a 
 dialect of) Nahuatl, a morphologically complex language of Mexico.  
 (Disclaimer: I'm a consultant on this project, hence biased :-).)

 There are of course other reasons (besides morphology) that make it 
 hard for people to look up words in dictionaries, such as spelling.  
 One can imagine inserting a spell corrector between the user and the 
 electronic dictionary.  For morphologically complex languages, such a 
 spell corrector will almost certainly have to be based off of a 
 morphological parser.

 And of course my whole long-winded answer presupposes that electronic 
 dictionaries (and the computers that they run on) are a reasonable 
 solution for the language speakers.  For speakers of languages in 
 California, that's probably true; for speakers in the Amazon, that may 
 not be a solution at all.

 Mike Maxwell
    CASL/ U MD

 ----------------------------------------------------------
 This message was sent using IMP, the Internet Messaging Program.

Dr. Hayim Y. Sheynin

---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20080504/62f2b6ae/attachment.htm>