programmatic generation of word lists for specialized dictionaries?

Tim Stewart timoteostewart1977 at GMAIL.COM
Tue Jan 12 16:49:56 UTC 2016

I've written a specialized dictionary, and now I'm attempting to write an
article about how I used a computer program to help generate the word list
for it. I'm curious how often other lexicographers have employed a
programmatic approach to generating a list of hypothetical forms and then
testing those forms against corpora to determine which forms represent
lexical items in use (I believe it has been done at least once
before---details below). So far my efforts to dig up information about this
topic in JSTOR and other academic databases have been fruitless. Maybe the
ADS list can help!

My dictionary contains 350 lexical items, each of which is a blend of two
(or more) names of Christian denominations. Examples of these items are
*bapticostal* (*Bapti*st + Pente*costal*), *fundagelical* (*funda*mentalist
+ evan*gelical*), and *lutholic* (*Luth*eran + Cath*olic*). All the items
are formed by blending syllables from a small set of 23 names of
denominations (Anglican, Baptist, Catholic, Episcopal, etc.). Given the
very narrow morphological and phonological criteria involved, it occurred
to me to generate a list of possible items by programmatically combining
parts of the names of these 23 denominations. Then I conducted searches for
these hypothetical forms against corpora and online text databases to
determine which forms I could find evidence for. I don't have the exact
results in front of me, but my computer program generated several thousand
hypothetical forms, and my searches then turned up quotational evidence for
around 100 terms. So the success rate was somewhere in the neighborhood of

So, on to my question... have there been other dictionaries whose word list
was (partly) generated using a method of programmatically generating
hypothetical forms and then winnowing the word list?

My understanding is that it has happened at least once before. In
their *Dictionary
of Krio-English* (OUP, 1980) Fyle and Jones describe a method they used
back in the early 1970s to rapidly build up their Krio word list:

“A search for all known monosyllables in the language, using native-speaker
competence. The method was simply to note all the combinations of
consonant(s) + vowel + consonant(s) ((C^n)V(C^n)) allowable by the
phonology of the language, and to record all those that turned out to be
actual Krio monosyllables. This search yielded well over 1,000
monosyllables” (xii).

Any help is appreciated.

Tim Stewart
tim at dictionaryofchristianese.com

P.S. For those who may be curious about this project and want to know more
about it, see http://www.dictionaryofblendeddenominations.com for a brief

