[Lexicog] Percentage of idioms vs single words
Philippe Humble
humble at CCE.UFSC.BR
Wed Feb 4 14:00:35 UTC 2004
Dear Ron,
Very interesting observations! I hadnt looked at it that way yet. You
probably already know, but in case you don't, John Sinclair has very
interesting considerations on collocations, multi-word items and the like
in his Corpus, Concordance, Collocation. Not a very recent publication,
but still very inspiring. I don't have the book here, but I think it's
Chapter 8. In the dictionary I have been working on intermittently for the
last 14 years (Portuguese-Spanish; one way; only translated examples) I
reached the same conclusion to the point that headwords were actually
reduced to the number of 5000. The fact that its also an electronic
dictionary allows me not to be obliged to think about how I would go about
if I had to classify the multi-word items in a paper dictionary. (Here also
Sinclairs brief remarks are seminal.) The never-sufficiently-praised (as
Don Quixote would say) Oxford English-Spanish Dictionary was a milestone in
that respect. (There could be an earlier one
) But in this dictionary many
of the words in one language are simply not translated but used in a
multi-word item. Which is then translated. (heedless: heedless OF sth:
heedless of the danger, the regiment
haciendo caso omiso del peligro, el
regimiento
). Maybe multi-word items are predominant because thats where
common words acquire their meaning.
As for Patrick Hanks remarks on corpus, I totally agree, but I have a few
observations. Everything depends on what you aim at with the examples of
your dictionary (since this is what the corpus is used for). I was a
critical observer at Cobuild 2 for a year. I started accepting what they
called their orthodoxy, of accepting all corpus material as evidence. I
ended up being convinced that producing evidence on language is one thing,
and writing a dictionary a different one. Its difficult to make a good
product if youre at the same time trying to make theoretical point.
I do think that, even allowing for some bias due to the almost exclusively
written input of most corpora, that corpora reflect the real usage of
words, on the condition that you know how to evaluate the data, i.e.,
introspectively. A great number of assertions, which I thought were beyond
attack, crumbled when I started revising my own Portuguese-Spanish
dictionary, and when I started learning Japanese a few years ago. In other
words, when I started to confront my theoretical convictions with my
practice as a language teacher and learner. If you want to teach a learner
a word using not-meddled-with corpus examples, you need a lot of them. Say,
twenty, depending on the case. And when you deal with a language pair of
very different families -- Indo-european/Japanese you must forget about
using natural (corpus) examples and resort to grammatical examples
excluding exactly multi-word items.
The main question is what do I want my dictionary to help with? If the
answer is describe the language, it seems beyond any doubt that corpora
can help, or even are the dictionary. If the answer is teach a
language, which is what learners dictionaries presumably aim at, then the
answer is not so clear. Monolingual dictionaries were invented, très
tardivement, with a mixture of political and scientific intentions. They
are the treasure room of knowledge on language. The aims of dictionaries
with a concrete aim, foreign language dictionaries, are wholly different
and have to take into account the wishes of the audience, also the
unconscious ones. A vast topic.
Philippe Humblé
Universidade Federal de Santa Catarina (Brasil)
At 21:18 3/02/2004, you wrote:
One discovery (that has implications for us) was when I was trying to think
of English example words for each domain in my list of semantic domains. I
found that a high percentage were multi-word lexical items. In some domains
I quickly ran out of single word entries, but could think of lots of
phrases. This phenomenon was repeated in a couple of workshops for Bantu
languages. The speakers were generating about 25% phrases.
I presume (without a lot of data to back me up) that our dictionaries should
have a goodly percentage of multi-word entries. A quick scan of Longman's
Language Activator shows about 50% multi-word entries. Can anyone give
figures for their dictionaries? Has anyone worked at identifying/generating
multi-word lexical items in such a way that they can estimate the percentage
of idioms vs single words in a language? I realize that there is a gradation
from collocation to idiom, so that it may be difficult to draw a line.
Ron Moe
SIL, Uganda
-----Original Message-----
From: List Facilitator
[<mailto:lexicography2004 at yahoo.com>mailto:lexicography2004 at yahoo.com]
Sent: Monday, February 02, 2004 10:11 PM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] Interesting lexical discoveries
What are one or two of the most interesting discoveries that stand out for
you (plural) in any of the lexical research that you have done?
Wayne Leman
Cheyenne dictionary project
Yahoo! Groups Sponsor
ADVERTISEMENT
<http://rd.yahoo.com/SIG=12cotkl7m/M=243273.4510124.5685162.1261774/D=egroupweb/S=1709195911:HM/EXP=1075925860/A=1750744/R=0/*http:/servedby.advertising.com/click/site=552006/bnum=1075839460612080>
----------
Yahoo! Groups Links
* To visit your group on the web, go to:
*
<http://groups.yahoo.com/group/lexicographylist/>http://groups.yahoo.com/group/lexicographylist/
*
* To unsubscribe from this group, send an email to:
*
<mailto:lexicographylist-unsubscribe at yahoogroups.com?subject=Unsubscribe>lexicographylist-unsubscribe at yahoogroups.com
*
* Your use of Yahoo! Groups is subject to the
<http://docs.yahoo.com/info/terms/>Yahoo! Terms of Service.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20040204/104a3ade/attachment.htm>
More information about the Lexicography
mailing list