<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<a class="moz-txt-link-abbreviated" href="mailto:maxwell@ldc.upenn.edu">maxwell@ldc.upenn.edu</a> wrote:

<blockquote cite="mid20080502145511.nillxm243ogw0wco@mail.ldc.upenn.edu"

 type="cite">

  <pre wrap="">Quoting Heather Souter <a class="moz-txt-link-rfc2396E" href="mailto:hsouter@gmail.com"><hsouter@gmail.com></a>:

  </pre>

  <blockquote type="cite">

    <pre wrap="">I, too, am very interested in learning about dictionary development

for languages with complex morphologies.  ...

Any insight into how to create dictionaries that are useful to

speakers and learners and not only language specialists would be

especially welcomed!

    </pre>

  </blockquote>

  <pre wrap=""><!---->

One "solution" (quote marks explained at the end of this msg) is to 

give people a computer program that allows them to look up words 

regardless of the inflected form that they type in.  For the simple 

cases, this can often be done by just looking for a substring of the 

typed-in word.  For a purely suffixing language, the substring would 

begin at the first letter of the typed-in word.

Of course, the simple cases are not the ones where people need the most 

help.  The complex cases--where there is prefixing (or worse, both 

prefixing and suffixing), or infixing, or reduplication, or lots of 

stem allomorphy--are the ones where people need help, and where the 

simple solutions don't work.  For these morphologically complex 

languages, there needs to be a morphological parser between the user 

and the electronic dictionary per se.</pre>

</blockquote>

<br>

For a dictionary user to be able to look up any wordform in a

computer-based (maybe online) dictionary, another approach would be to

explicitly list all forms in the dictionary. Since such a dictionary

would take a lot of paper to print, we're in the habit of avoiding such

an approach. But as I've explored the capabilities of the FLEx program,

it strikes me that there seems to be an appropriate place to explicitly

list any wordform that we might desire to include as a lookup form. A

derived form can be given its own place as the headword of an entry,

and linked as a "complex form" to the root or stem from which it's

derived. An inflected form can be given its own place as the headword

of a minor entry and linked as an "inflectional variant" to the

uninflected form of the stem, or to the inflected form that users will

most likely try to look up.<br>

<br>

Automated parsing could still have a role in such a dictionary, but the

role would be to assist in building the dictionary rather than to

assist in reading it. When analyzing words that it encounters in

vernacular texts, the parser would draw its conclusion regarding what

roots and affixes make up the word, and thus what entries it should be

linked to. Based on his knowledge of the actual meanings of the words,

the human dictionary compiler would then evaluate whether to accept the

parser's choice or make links that the parser didn't predict. If it

involves some regularity of the language that the parser just doesn't

yet handle, the dictionary compiler could use this parser failure as

feedback to help improve the parser's success in future predictions. If

it involves an irregularity of the language which can't reasonably be

captured by the parser, then it can just be left as residue as far as

the parser is concerned. A dictionary user will still be able to find

the word, since it has been explicitly listed and linked.<br>

<br>

This approach wouldn't do anything for the finding of words that

haven't yet been encountered in texts. So once the parser has "learned"

the language well enough to give fairly reliable results, it might be

profitable to combine Mike's approach with this one - using the parser

for lookup of any words that don't yet have exact matches in the

dictionary. And whenever this happens, the newly looked-up words could

be submitted for human review so that they can be explicitly listed for

future lookups.<br>

<br>

Allan J.<br>

<br>

<span width="1" style="color: white;"/>__._,_.___</span>

<!-- Start Recommendations -->

<!-- End Recommendations -->

<!-- |**|begin egp html banner|**| -->

  <img src="http://geo.yahoo.com/serv?s=97476590/grpId=11682781/grpspId=1709195911/msgId=4448/stime=1210077694" width="1" height="1"> <br>

<!-- |**|end egp html banner|**| -->

<!-- |**|begin egp html banner|**| -->

<br>

      <div style="font-family: verdana; font-size: 77%; border-top: 1px solid #666; padding: 5px 0;" >

      Your email settings: Individual Email|Traditional <br>

      <a href="http://groups.yahoo.com/group/lexicographylist/join;_ylc=X3oDMTJnaGxwOTB2BF9TAzk3NDc2NTkwBGdycElkAzExNjgyNzgxBGdycHNwSWQDMTcwOTE5NTkxMQRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjEwMDc3Njk0">Change settings via the Web</a> (Yahoo! ID required) <br>

      Change settings via email: <a href="mailto:lexicographylist-digest@yahoogroups.com?subject=Email Delivery: Digest">Switch delivery to Daily Digest</a> | <a href = "mailto:lexicographylist-fullfeatured@yahoogroups.com?subject=Change Delivery Format: Fully Featured">Switch to Fully Featured</a> <br>

           <a href="http://groups.yahoo.com/group/lexicographylist;_ylc=X3oDMTJlM2tjYXNkBF9TAzk3NDc2NTkwBGdycElkAzExNjgyNzgxBGdycHNwSWQDMTcwOTE5NTkxMQRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTIxMDA3NzY5NA--">

        Visit Your Group 

      </a> |

      <a href="http://docs.yahoo.com/info/terms/">

        Yahoo! Groups Terms of Use

      </a> |

      <a href="mailto:lexicographylist-unsubscribe@yahoogroups.com?subject=Unsubscribe">

       Unsubscribe 

      </a> 

 <br>

    </div>

  <br>

<!-- |**|end egp html banner|**| -->

<span  style="color: white;"/>__,_._,___</span>

</body>

</html>