[Lexicog] Potential words

Richard Rhodes rrhodes at BERKELEY.EDU
Thu Jul 9 17:28:28 UTC 2009


Richard,
	A version of this method is used 
extensively by the PDLMA (the Project for the 
Documentation of the Languages of Meso-America/El 
Proyecto para la Documentación de las Lenguas de 
Mesoamérica). Based on known phonotactics of 
related languages and preliminary work 
establishing sound correspondences in the dialect 
under investigation, a list of all possible roots 
is created (usually about 4000-5000 forms) and 
the fieldworkers slog their way through. The hit 
rate is pretty low -- maybe 20-30% -- and some 
existing roots are rejected in this process 
because they only occur in combinations and 
aren't readily recognizable in isolation, but, in 
general, this tool achieves great success, as 
shown by the fact that it regularly ferrets out 
heretofore unknown roots in language families 
that are fairly well studied.

Rich Rhodes




>Hi Richard,
>
>I used this method with a Mon-Khmer language 
>many years ago just to collect words, but I also 
>checked it against my CVC distribution chart and 
>found certain combinations that were avoided for 
>unknown reasons--maybe they didn't sound good or 
>carried bad connotations--and certain areas for 
>special use. The one that I remember is that 
>children's given names shared an area on the 
>distribution chart that was not used for 
>anything else. This was particularly interesting 
>because their is a sanction against using the 
>same name as anyone else known to be living or 
>dead. I guess that a population explosion could 
>force them into using other areas for making 
>names.
>
>Dick
>
>
>
>
>
>Hi Richard,
>
>I did this kind of thing in the early days of 
>our language project in PNG. I had worked out 
>the consonant and vowel phonemes for Amele and 
>that word roots could be one, two or three 
>syllables. I then had someone at Ukarumpa High 
>School write a program (this was in 1978) to 
>generate possible word roots in Amele based on 
>the phoneme inventory and the syllable patterns. 
>The lists for the one and two syllable roots 
>weren't too long but the list for the three 
>syllable roots was enormous! I then distributed 
>these lists to various Amele people for them to 
>try and identify actual word roots. They thought 
>this was great fun. But the hard work was then 
>confirming that the roots indicated were actual 
>words and what their meaning and usage were.
>
>Another SIL member I know (who I met again just 
>recently) used the same method for their 
>language project. He said when he gave the lists 
>out to people one of the men asked, "Do you want 
>all the dirty words too?"
>
>But as I recall, this method of "generating" 
>words in a language was somewhat frowned upon by 
>the linguistic establishment in SIL-PNG in those 
>days. But I found I got a lot of words that I 
>might not have got hold of otherwise - such as 
>taboo words.
>
>I believe there is software available in SIL now 
>that can do this kind of thing for you. You 
>don't have to ask a high schooler to do it for 
>you. Oh, and the people you are working with 
>need to be literate in their own language.
>
>John Roberts
>
>
>
>Richard Gravina wrote:
>I'm interested in knowing more about the method 
>of data collection based on 'potential' words. 
>This is where you create lists of artificial 
>words by randomly combining letters, and then go 
>through the lists with native speakers to see if 
>the words actually exist in the language.
>
>
>Does anyone have any experience of using this? 
>Do you know of any resources or software that 
>would help?
>
>Richard Gravina
>
>
>
>
>
>The following document was sent as an embedded 
>object but not referenced by the email above:
>Attachment converted: Macintosh HD:Untitled 128 (GIFf/«IC») (00217B9C)
>The following document was sent as an embedded 
>object but not referenced by the email above:
>Attachment converted: Macintosh HD:Untitled 129 (GIFf/«IC») (00217B9D)


-- 
******************************************************************
    Richard A. Rhodes
    Department of Linguistics
    University of California
    Berkeley, CA 94720-2650
    Voice (510) 643-7325
    FAX (510) 643-5688

  ******************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20090709/c90840d8/attachment.htm>


More information about the Lexicography mailing list