27.4332, Qs: Dictionary Wordlist from List of Generated Words
The LINGUIST List via LINGUIST
linguist at listserv.linguistlist.org
Wed Oct 26 20:12:53 UTC 2016
LINGUIST List: Vol-27-4332. Wed Oct 26 2016. ISSN: 1069 - 4875.
Subject: 27.4332, Qs: Dictionary Wordlist from List of Generated Words
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry,
Robert Coté, Michael Czerniakowski)
Homepage: http://linguistlist.org
***************** LINGUIST List Support *****************
Fund Drive 2016
25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================
Date: Wed, 26 Oct 2016 16:12:45
From: Tim Stewart [timoteostewart1977 at gmail.com]
Subject: Dictionary Wordlist from List of Generated Words
I've written a specialized dictionary about a religious sociolect, and now I'm
attempting to write an article about how I used a computer program to help
generate the word list for it. I'm curious how often in the past other
lexicographers have employed a programmatic approach to generating a list of
hypothetical forms and then tested those forms against corpora to determine
which hypothetical forms represent actual lexical items in use (I believe it
has been done at least once before---details below). So far my efforts to dig
up information about this topic in JSTOR and other academic databases have
been fruitless. Maybe the LINGUIST list community can help!
My dictionary contains 350 lexical items, each of which is a blend of two (or
more) names of Christian denominations. Examples of these items are
bapticostal (Baptist + Pentecostal), fundagelical (fundamentalist +
evangelical), and lutholic (Lutheran + Catholic). All the items are formed by
blending syllables from a small set of about two dozen names of denominations
(Anglican, Baptist, Catholic, Episcopal, etc.). Given the very narrow
morphological and phonological criteria involved, it occurred to me to
generate a list of possible items by programmatically combining parts of the
names of these denomination names. Then I conducted searches for these
hypothetical forms against corpora and online text databases to determine
which forms I could find evidence for. I don't have the exact results in front
of me, but my computer program generated several thousand hypothetical forms,
and my searches then turned up quotational evidence for around 100 terms. So
the success rate was somewhere in the neighborhood of 2%.
My question: Have there been other dictionaries whose word list was (even
partly) generated using a method of programmatically generating hypothetical
forms and then winnowing the word list?
My understanding is that it has happened at least once before. In their
''Dictionary of Krio-English'' (OUP, 1980) Fyle and Jones describe a method
they used in the early 1970s to rapidly build up their Krio word list:
“A search [was made] for all known monosyllables in the language, using
native-speaker competence. The method was simply to note all the combinations
of consonant(s) + vowel + consonant(s) ((C^n)V(C^n)) allowable by the
phonology of the language, and to record all those that turned out to be
actual Krio monosyllables. This search yielded well over 1,000 monosyllables”
(xii).
Any leads and suggestions are appreciated.
Tim Stewart
tim at dictionaryofchristianese.com
Linguistic Field(s): Lexicography
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Fund Drive 2016
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Thank you very much for your support of LINGUIST!
----------------------------------------------------------
LINGUIST List: Vol-27-4332
----------------------------------------------------------
More information about the LINGUIST
mailing list