[Lexicog] Incorporating an existing English/Vernacular word list/dictionary into a dictionary project....

Thapelo Otlogetswe thaps at YAHOO.COM
Wed Mar 26 15:09:56 UTC 2008


Ron
   
  I found the part of your message about generating a phonemic rendering of words from an orthographic list of words interesting. Certainly with languages in which the pronunciation is predictable from the spelling such a tool would assist in the production of pronunciation dictionaries, although this may not immediately address the marking of tone in tonal language words (??). The question that I wish to ask is if one had a wordlist of say 40,000 headwords and they wished to upload it into FIELDWORKS and then generate a phonemic rendering of each word,  how would they go about it? Assuming such a process was successful computationally, how would one then go about marking tone in phonemic words? In many African languages, while tone is lexicalised, it is not marked orthographically. Will marking tone therefore demand manual labour or there is an elegant computational way of doing it?
   
  Many thanks
  Thapelo
  
Ronald Moe <ron_moe at sil.org> wrote:
        v\:* {behavior:url(#default#VML);}  o\:* {behavior:url(#default#VML);}  w\:* {behavior:url(#default#VML);}  .shape {behavior:url(#default#VML);}        st1\:*{behavior:url(#default#ieooui) }                Heather Souter wrote:
  “Soon  I will become part of a team that will be working to create the first dictionary of our language that focuses on the vernacular.”
   
  Hi Heather,
   
  It would be a very simple matter to incorporate the existing English-vernacular word list into a monolingual or bilingual vernacular-English dictionary. There are tools available that can reverse a dictionary. For instance we could take the following input:
   
  \lx doctor
  \de tabibu; mganga; daktari
   
  and transform it into:
   
  \lx tabibu
  \de doctor
   
  \lx mganga
  \de doctor
   
  \lx daktari
  \de doctor
   
  This can be done in a couple of minutes, no matter how large your dictionary is.
   
  There are also tools available that can help you update an orthography or transliterate one script into another (e.g. orthography into IPA). The length of time it would take would depend on how much you need to interact with the changes. If the changes are regular, we could set up a table of correspondences. The table could then be applied to your database in a matter of minutes. However if your orthography does not accurately reflect the phonology of the language, then you will need a tool that allows you to interact with a Find/Replace function. The FieldWorks program has a tool specifically designed for such a task. FieldWorks is available free of charge from the SIL website. FieldWorks also includes a tool for collecting and typing words using the DDP word collection method. I would highly recommend that you use FieldWorks, since it has the most powerful tools that I am aware of for rapidly developing a dictionary database.
   
  Since time is of the essence in your situation, DDP is the most efficient method of collecting lots of words in a short time. Many teams are collecting 10,000 to 20,000 words in a few weeks. The number of words collected depends on a number of factors, such as the number of mother tongue speakers available to work on the project, how vigorous is language use, etc. If you only have a few speakers of the language left, your results might be far less, but will still be much better than other methods. You should also collect as many texts as possible, since this will supplement the DDP method and provide solid evidence for semantic research.
   
  If you have other questions, post them to this discussion group and one of us will try to help you.
   
  Ron Moe
   
      
---------------------------------
  
  From: lexicographylist at yahoogroups.com [mailto:lexicographylist at yahoogroups.com] On Behalf Of Heather Souter
Sent: Monday, March 24, 2008 6:48 PM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] Incorporating an existing English/Vernacular word list/dictionary into a dictionary project....

   
        Hello!  I am a community linguist (both some formal and informal training at the master's level) and a member of community with a highly endangered language.  I have been involved in some basic phonological analysis and also revitalization efforts (creation of basic pedagogical materials).  Soon  I will become part of a team that will be working to create the first dictionary of our language that focuses on the vernacular.  In other words, it will not be a translation of an English dictionary.   It is exciting.  However, not being trained in lexicography, I am finding the learning curve quite steep!   

Here, I have a question.  An English-vernacular word list/dictionary of our language exists.  The headwords are English and there are one, two or three possible translations given in our language as well as some example sentences.  There is no grammatical information included at all.  (Still, it is a wonderful resource!)  I would like to know how this could be included in the dictionary project that will be starting shortly.  To complicate matters, the orthography is pretty good but not linguistically adequate (being based on English spellings!).  We likely will be using a different orthography (as well as IPA for research purposes).   The creation of a digital version of the existing word list/dictionary is possible (once permission is secured).

As the project is somewhat politically charged at present, I would prefer not divulging the name of our language.  I trust that you all understand how touchy projects like this can be and will not press me to on this matter.   I thank you for you understand....

H.S.

PS:  I have taken a look at the DDP developed by Ron Moe and  have asked the project leader to consider this approach.  I think it could work very well for us as time is of the essence!  Our Elders are passing way every day....

  

 
  No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.7/1333 - Release Date: 3/18/2008 8:10 AM


  No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.7/1333 - Release Date: 3/18/2008 8:10 AM






                      -------------------------------
Dr. Thapelo J. Otlogetswe
Corpus linguist & lexicographer
University of Botswana
Department of English
Private Bag 00703
Gaborone, Botswana
Tel: (+267) 355 2093













       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20080326/84c3f5d0/attachment.htm>


More information about the Lexicography mailing list