Fw: [Lexicog] How to select words for a dictionary

Patrick Hanks hanks at BBAW.DE
Mon Mar 15 11:00:29 UTC 2004


Hello fellow lexicographers and Mery in particular ---

I tried to send the message below and another one on the same subject from
my home email last week (I was nursing a stomach bug) -- but it bounced.  So
I'm resending them now. Better late than never, I suppose.

Patrick


----- Original Message -----
From: "Patrick Hanks" <patrick_hanks at gmx.de>
To: <lexicographylist at yahoogroups.com>
Sent: Thursday, March 11, 2004 5:00 PM
Subject: Re: [Lexicog] How to select words for a bilingual dictionary


>
>
> I agree wholeheartedly with Wayne that lexicography should be
corpus-based. I would turn the question around, however -- i.e. not  "What
should we put in?" but "What can we leave out?"
>
> Richard Chenevix Trench said in 1858 (in a paper that was very influential
on OED) that a dictionary is an "inventory" of the language and that
lexicographers should consider themselves as inventory clerks. Of course, an
inventory aims to be a list of all the items in stock.  But it turns out to
be impossible to compile an inventory of "all" the words in a language, for
several reasons, among them:  a) because the lexicon is dynamic -- even if
you were to discover all the words that have ever been used up to now,
people will invent new ones tomorrow; and b) because in practice it's hard
to decide what counts as a word (is tete-a-tete an English word? Is
oompahing? And what about all those noun-noun compounds that we discussed
last month?)
>
> So selection criteria have to be developed.  One common -- though rarely
acknowledged -- selection criterion for dictionaries is, "This word is in a
competing dictionary, so we dare not leave it out."  Such are the pressures
of commercial publishing.  :-)
>
> Anyway, even the smallest dictionaries try to include all "normal" words
of the language. Smaller dictionaries generally have fewer and shorter
definitions; fewer examples; and they exclude domain-specific terms (e.g.
the technical vocabulary of science, engineering, sports, law, etc., etc.)
>
> In my experience, lexicographers editing smaller dictonaries can spend
quite a lot of time agonizing about what to leave out. A corpus can provide
reassurance on this.  If a word does not occur at all in 100 million words
of carefully selected, balanced English texts, it can't be very important,
even if it IS in somebody else's dictionary.  The smaller the dictionary,
the higher the threshhold.  Here's an example.  The word "catatonia" occurs
8 times in the British National Corpus, mostly in medical contexts. That's
quite rare. If pressed for space, the compiler of a small dictionary might
decide to leave it out.  But then, the marketing people might say (rightly),
"But that's exactly the sort of word people will want to look up!"  So there
is a difficult decision to make, with no "correct" answer.

> Even a small dictionary would probably include "catatonic".  It has over
30 occurrences in BNC, and some of them are not at all domain-specific to
medical contexts.
>
>I suspect -- but do not know -- that bilingual lexicographers sometimes
look to the monolingual dictionaries as well as (or even instead of) a
corpus for guidance on the word list.

> Mery, for your thesis, you might want to select a couple of comparatively
rare words like this, look at their frequency in a relevant corpus, and then
look and see what various smaller dictionaries did with them.
>
> And what about "the web as corpus"?  Well, one advantage of using a
"prepared" corpus like BNC is that it aims to be a balanced, representative
sample of English texts, so comparisons of frequencies are probably more
reliable, whereas the web is a vast unbounded collection of texts where
"anything goes" - not such a good basis for comparisons of frequency.
>
> One other thought re bilingual dictionaries. Another thing that
lexicographers agonize about is names - in bilingual dictionaries this means
place names. E.g. Should Milano - Milan -  Mailand be in the dictionary, for
example?
>
>
> Patrick
>
>
> ----- Original Message -----
> From: "Wayne Leman" <wayne_leman at sil.org>
> To: <lexicographylist at yahoogroups.com>
> Sent: Wednesday, March 10, 2004 8:18 PM
> Subject: [Lexicog] How to select words for a bilingual dictionary
>
>
> > Mery, I would try to practice corpus linguistics, using a computer to
> search
> > large corpuses of natural text (newspapers, conversations, etc.) then do
> > word counts (with the computer) to find the most commonly used words.
> >
> > Wayne Leman
> > Cheyenne dictionary project
> >
> > > Dear all,
> > > in my MA thesis on bilingual lexicography I am describing the ways in
> > which dictionary words can be selected. I know that it depends on the
> > variety of language treated in the dictionary. Imagine that you had to
> > select form your own language the words to treat in a big general
language
> > bilingual dictionary and those for a pocket one, how would you do it?
> > > Regards,
> > > Mery Martinelli
> > > SSLMIT, Bologna (Italy)
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
>
>




Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list