[Lexicog] The DDP and corpora

Ron Moe ron_moe at SIL.ORG
Thu Mar 18 22:06:54 UTC 2004


I'm a big fan of the corpus method, not because I use it (other than to
study Koine Greek), but because I see so many valuable benefits from using
it. The DDP resulted from my efforts to develop materials to facilitate the
production of dictionaries for minority languages. When I first started
working with a speaker of Lugungu to develop materials in Lugungu, I asked
him to collect everything written in the language. He replied, "There is
nothing written in Lugungu." You can't use the corpus method without a
corpus. So one of the first things we did was to hold a story writing
contest. We collected hundreds of pages of text, but the difficulties of
standardizing the orthography, typing and editing, and developing a parser,
made the process of forming a text corpus very time consuming. We can
collect more words (in their citation form) in less time using DDP, than we
could have by developing a sufficiently large text corpus. That is not to
say that we should not develop a text corpus. I am frequently surprised by
the definitions in English corpus based dictionaries because they do not
agree with my native speaker intuition. But upon reflection and looking at
their example sentences, I invariably realize they are right. So I would
recommend developing a text corpus for every language that doesn't have one.
Unfortunately with the extremely limited resources available to most
minority languages, it will take a long time to develop a large enough
corpus to supply enough examples to reveal patterns of usage for all but the
most common words. In the meantime I'm trying to think how we can produce a
reasonably large and well developed dictionary without a corpus. I'm
especially interested in ways to help speakers of a language to do most of
the work with minimal training. I believe the only way is to develop an easy
to use, step by step procedure with all the materials, explanations, and
tools necessary to do each step.

You ask how we could discover dog metaphors. The method would involve
thinking about each aspect of a dog's anatomy and behavior, and then
thinking about the words and phrases we use to talk about them. Then think
about metaphorical extensions to human behavior (and possibly other areas of
life). The mind forms links between words and phrases. DDP utilizes these
links to jump from word to word within a domain. Let your mind roam. You
will be amazed at how fast it can come up with all sorts of words and
phrases that we use to talk about a topic. The mind will sometimes jump from
one word to another in a paradigmatic relationship (bark, woof, ruff, howl,
yowl, yip, yap). Then it will start on syntagmatic relationships (dogs have
fleas, dogs and cats). Then the mind will start rearranging or expanding
words into phrases (dogs have fleas > flea bitten, dogs and cats > raining
cats and dogs). Sometimes the mind strays into another domain. When I was
thinking of dog metaphors, my mind went from 'claw' to 'claw your way to the
top', which is actually a cat metaphor. The mind is amazing. It takes a
little training and practice to get your mind to do a lot of free
association and creative word generation. Not everyone is as good at it as
other people. I'm actually not as good as some other people in spite of all
my practice.

Every domain is unique. It takes some work for me to fill out a domain and
subcategorize all the words in it. Then I have to analyze the semantics. I
look through the literature for help. Lakoff describes the metaphor 'anger
is heat and pressure'. That helped me identify some of the expressions we
use in English to talk about anger. The notion of extended metaphors that
apply to a domain has been very helpful. So has the literature on lexical
relations. Fillmore's work on case relations has influenced me for a long
time, which is why I drooled over the FrameNet web site (sorry about the dog
metaphor). All these insights help me to more rapidly think of words and
understand what they mean. One reason why I spend as much time on this
discussion list as I do is because I'm learning so much from all of you.
Thanks.

Ron Moe
SIL, Uganda
  -----Original Message-----
  From: John Roberts [mailto:dr_john_roberts at sil.org]
  Sent: Thursday, March 18, 2004 1:01 PM
  To: lexicographylist at yahoogroups.com
  Subject: [Lexicog] The DDP and corpora


  Ron,

  Your DDP seems to be excellent for "discovering" a large proportion of the
vocabulary of a language in a short period of time. It is also excellent in
that it involves native speakers directly in the discovery process. Your W&D
article says, "It would take a text corpus of a million of words to equal
the results in terms of numbers, and many words are so rare they may not
show up in a text corpus." So, you can produce a large amount of vocabulary
items without a corpus. But don't you still need a large corpus to generate
collocational and combinatory fields and discover all the senses and uses of
words, for example? Even a dictionary like the Longman Language Activator,
which is organised around semantic domains, is corpus-based - and they
indicate that sometimes the meaning of a word from an analysis of the corpus
is different to what native speakers commonly assume a word means. Other
corpus based dictionaries, such as NODE, also give some examples of this. I
am also not clear how you would discover all the 'dog' metaphors in English
you mentioned which do not include 'dog' in the expression. For example, an
expression like "Go at it tooth and nail." doesn't immediately spring to my
mind as a dog metaphor. What methodology do you use to work this out?

  John Roberts



        Yahoo! Groups Sponsor
              ADVERTISEMENT





----------------------------------------------------------------------------
--
  Yahoo! Groups Links

    a.. To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

    b.. To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

    c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20040318/6bcce29d/attachment.htm>


More information about the Lexicography mailing list