[RNLD] dictionary methods - discovering new words

Nick Thieberger thien at unimelb.edu.au
Mon Mar 30 05:08:35 UTC 2015


Thanks to all who responded to my request for methods for finding
additional words to add to an existing dictionary. Below are the responses
(both on- and off-list) collated into one message. I am also working with a
programmer to develop an online service in which you will be able upload a
set of headwords in a language (with a list of consonants and vowels) and
have it generate a list of possible forms. More news on that as it develops.

Thanks for feedback from : Peter Austin, Claire Bowern, Sally Dixon,
Matthew Dryer, Sue Hansen, Tom Honeyman, František Kratochvíl, Bill Poser,
Ruth Singer.
+++++++++++++++
For finding new words I can suggest you what I have been using myself:

- In FLEx under List you find the Anthropology Categories (I am sure you
can ask your anthro colleagues for that list and the numbering going with
it). The MC - Material Culture is a good one to start with. Working by
Semantic Categories

- Picture Dictionaries are great

I find the later one less dry when working with communities in rural areas
and with people with lesser academic background. Thus, the method will
depend on who you working with and in what conditions.
+++++++++++++++
I wrote a program a while back for generating possible words:
http://billposer.org/Software/WordGenerator.html

As for methods of finding new words, presumably everybody knows: (a)
collect texts and search them for words you don't already have; (b) if the
existing dictionaryhas example sentences, search them for words not already
entered as headwords.
+++++++++++++++
Some of the ethnobiology apps could be useful - if they have things like
this for the Pacific?
https://play.google.com/store/apps/details?id=com.coolideas.eproducts.ausbirds&hl=en
+++++++++++++++
I used reference books for region-specific flora and fauna as stimulus
materials and found that generated both the nominal information (names of
the plant/animal) but also verbs relating to somewhat species-specific
behaviour.

+++++++++++++++
To do this, express your phonotactics as a regular expression, and then use
a generator like these:

https://code.google.com/p/xeger/
https://github.com/mifmif/Generex
https://github.com/asciimoo/exrex

to produce all possible strings. The more precise your regular expression
the better. Put a limit on the length of words of course! The last link has
a command line interface.
+++++++++++++++
I have tried the Rapid Words (it's derived from the Ron Moe's DDP).
I have done it slightly differently (logistic reasons). We started with
about 2700 words.

In the summer of 2013, we went with about 15 participants up to 7000 words
in just three days.
Last summer, we continues with more people (about 20) and ended up with
15000 after four days.

It's quite tiring for people who have low literacy skills. There needs to
be a day off after every second day.

We worked on Abui (Papuan, Eastern Indonesia).
The method is excellent. Some people are happy to work in groups, others
alone. We get to discuss a lot about orthography and the socializing effect
is great too. (you have probably seen the rapid words video).

We are now digitizing the data (we have no electricity in the field, so we
could not input it at the same time). I expect at least one more year time
before it's done.
We have developed a linking between DDP (Semantic domains) and Wordnet and
the idea is to create a Wordnet from the large 15,000 dictionary and
further enrich senses through tagging of the corpus. That will again take a
few years, I expect. We have released the linking and will try to further
improve it, once we have the Abui data digitized.

We also tried to have all new words recorded, but the recording goes much
slower than the writing. So we have only about a third.
We will probably do another session for proofing and semantic relations
such as synonyms, antonyms etc, and a final session to get native
definitions (I would really like a monolingual dictionary for Abui as well).

Another great benefit was that we agreed with the participants what the
citation forms should be. Abui has a lot of prefixation and so you cannot
have a user-friendly dictionary organized according to the roots, which is
what I have done with a primer.
+++++++++++++++
WeSay is a nice implementation of Moe's method that can be used by literate
native speakers -- it also enables audio recording of words and example
sentences.

Monolingual defining was a method much favoured by the late Ken Hale as a
way to generate more vocabulary and has been used by successfully by
several people I know.
The classic paper on this dates from 1967 by Casagrande and Hale --
available at
http://compbio.ucdenver.edu/Hunter_lab/Cohen/FieldMethods/Casagrande_Hale_1967.pdf
+++++++++++++++
1. I use a process I call 'shades of meaning'. I take an existing word in
the dictionary and ask open questions about that word. I use an A3 piece of
paper and write the word in the middle of the page with a circle around it.
Then we do a semantic map related to the word mapping out the shades of
meaning and semantic connections with lines. I have found speakers get what
I'm doing after a while and love to look back through the sheets and add
words. I leave a copy of the sheet with them overnight and they use a
different coloured pen to add words as they think of them.

For example, the word 'dog' has resulted in masses of words and phrases
such as big, mangy, skinny, bark, whine, scratch, yelp, flea, lying under
the bed, jumping, scratching etc

One of the old ladies I work with on Tjupan language loves doing them as
she sees the sheets as a big word game and it really stimulates her
thinking.

2. Semantic domain stimulus pictures: I have a flip folder for each
semantic domain and add pictures from magazines, internet drawings etc as I
find them. Then the speakers and I look through and this helps stimulate
their thinking about a particular domain.

3. What's the same and what's different: I have a set of 60 cards with all
sorts of pictures on them. We shuffle the deck and choose two cards at
random. Placing them face up, we discuss what's the same and what's
different between the two pictures. Results in complex sentences as people
have to explain their thoughts. They may explain physical similarities
through to more abstract relationships. Also usually results in a lot more
laughing as the explanations get wild and wacky. Then put the cards back
and shuffle again.

4. Morphed pictures: A student who work-shadowed me had a series of
pictures which were of morphed animals. So a kangaroo with a sheep's head,
a car with horses legs etc. He asked what people would call the being in
the picture and this resulted in lots of discussion about features and good
complex sentences.
+++++++++++++++
My favourite way is to brainstorm texts around particular topics, including
vernacular definitions, in small groups. It's a good way to generate
discussion of lexical concepts and how terms might be used.
+++++++++++++++
I have used monolingual defining, but another similar method I have used is
to ask a speaker to make up a sentence using a given word.  I have been
surprised that these sentences sometimes reveal new vocabulary at a faster
rate than texts, and often vocabulary that one might not expect to find in
texts.  It does depend on the speaker, however.  Some speakers only come up
with very simple sentences.
+++++++++++++++
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20150330/f5f56f66/attachment.html>


More information about the Resource-network-linguistic-diversity mailing list