[Lexicog] Digest Number 353

Allan Johnson allan_johnson at SIL.ORG
Thu May 26 02:02:10 UTC 2005


Hi Dick,

The idea of a wordform inventory with links to the roots is a good one.  In my opinion, at least this much attention to the actual wordforms is necessary.  My opinion of course is biased.  This bias comes from my experience as a language learner many years ago.  I was trying to get a feel for the language by reading parts of the New Testament in a related language.  This wasn't easy, so I went to the dictionary of that language for help.  But it turned out to be almost no help, because this dictionary had followed a custom of listing only roots, with no indication of possible inflections or derivations.

Maintenance of links doesn't need to be a problem.  If I just keep a list of occurring wordforms in the entry for each root, then whenever links need to be updated, the computer should be able to simply regenerate the whole list.  What I'm picturing here is analogous to the automatic generation of an English index, as Shoebox does for MDF dictionaries.

And how much work is it to create these lists of wordforms that I would like to keep in the entry for each root?  Not as much as might be expected, if you can make good use of a parser to semi-automate the process of identifying the root for each wordform.  I've done this with the word list from a New Testament, and it took about 2 months of work to complete.

I do need to admit to you that I probably don't fully grasp the size of the problem you're writing about for highly agglutinating languages.  My experience is just with Philippine languages, and the ones I've worked with use only 10-20 thousand different wordforms for a New Testament.

Allan


  ----- Original Message ----- 
  From: Dick_Watson at gial.edu 
  To: lexicographylist at yahoogroups.com 
  Sent: Thursday, May 26, 2005 6:47 AM
  Subject: Re: [Lexicog] Digest Number 353



  From: Dick Watson <dick_watson at gial.edu
  Subject: Re: Digest Number 343 

  My response to the desire for all wordforms in an electronic dictionary is that you would have a monstrosity if the language were highly agglutinating.  You would not only have huge redundancy, you would have all the work of dealing with each and every entry, deciding how much information to include in each one, most of which would be redundant, but forever maintaining all of the additions and corrections to keep up such a huge database.  Would you limit your wordforms to those actually found in a corpus or would you also run through paradigms of all possible forms?  The latter would run into all kinds of problems with derivations, many of which would never occur or would not necessarily have the same meaning as that predicted, besides the sheer enormity of the task. 
  It could be more practical to have a separate simple wordform inventory with links to the roots, stems or citation forms in the dictionary, but even the maintenance of all those links would keep you from more important lexicographic tasks, not to mention taking time out to meet your grandchildren, if there had been time to have children. 

  Dick 

------------------------------------------------------------------------------
  Yahoo! Groups Links

    a.. To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/
      
    b.. To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com
      
    c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20050526/6c4a0215/attachment.htm>


More information about the Lexicography mailing list