[Lexicog] Collaborative lexicography software?
Hayim Sheynin
hsheynin19444 at YAHOO.COM
Sun May 4 18:44:13 UTC 2008
Dear Mike:
You give an empty form in your example of how this works (for Arabic) at
http://projects.ldc.upenn.edu/art/reader/source/Al-Kitaab.01. Can you send several actual examples of Arabic lexemes from this database, e.g. ta`arif or
or suhuniyyat(un). It is difficult to understand the structure of the database without actual lexemes. I also do not know whether the examples will be
displayed in the readable form in the email.
Hayim Sheynin
maxwell at ldc.upenn.edu wrote: Quoting Heather Souter <hsouter at gmail.com>:
> I, too, am very interested in learning about dictionary development
> for languages with complex morphologies. ...
> Any insight into how to create dictionaries that are useful to
> speakers and learners and not only language specialists would be
> especially welcomed!
One "solution" (quote marks explained at the end of this msg) is to
give people a computer program that allows them to look up words
regardless of the inflected form that they type in. For the simple
cases, this can often be done by just looking for a substring of the
typed-in word. For a purely suffixing language, the substring would
begin at the first letter of the typed-in word.
Of course, the simple cases are not the ones where people need the most
help. The complex cases--where there is prefixing (or worse, both
prefixing and suffixing), or infixing, or reduplication, or lots of
stem allomorphy--are the ones where people need help, and where the
simple solutions don't work. For these morphologically complex
languages, there needs to be a morphological parser between the user
and the electronic dictionary per se. The parser's job is to remove
all the suffixes, undo any stem allomorphy, convert the stem into a
dictionary citation form, and finally look up the citation form in the
actual dictionary.
One project that is building such tools in a generic fashion (i.e. in a
way that should be portable to more languages, as opposed to a
proprietary way that just works for French, say), is a Department of
Education funded project at the Linguistic Data Consortium (LDC).
There's an example of how this works (for Arabic) at
http://projects.ldc.upenn.edu/art/reader/source/Al-Kitaab.01. In this
case, the lookup is limited to the text shown there, but a simple
modification would allow the user to type in words to be looked up.
The project is also demonstrating lookup with the same tool on (a
dialect of) Nahuatl, a morphologically complex language of Mexico.
(Disclaimer: I'm a consultant on this project, hence biased :-).)
There are of course other reasons (besides morphology) that make it
hard for people to look up words in dictionaries, such as spelling.
One can imagine inserting a spell corrector between the user and the
electronic dictionary. For morphologically complex languages, such a
spell corrector will almost certainly have to be based off of a
morphological parser.
And of course my whole long-winded answer presupposes that electronic
dictionaries (and the computers that they run on) are a reasonable
solution for the language speakers. For speakers of languages in
California, that's probably true; for speakers in the Amazon, that may
not be a solution at all.
Mike Maxwell
CASL/ U MD
----------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Dr. Hayim Y. Sheynin
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20080504/62f2b6ae/attachment.htm>
More information about the Lexicography
mailing list