[Lexicog] Percentage of idioms vs single words
Lou Hohulin
lou_hohulin at SIL.ORG
Wed Feb 4 15:23:05 UTC 2004
Philippe,
Wow! You have brought up at least three very important issues for all of us who are working on dictionaries -- the theoretical, the political and the practical. Thank you for this e-mail.
Lou
On Wed, 04 Feb 2004 15:00:35 +0100
Philippe Humble <humble at cce.ufsc.br> wrote:
> Dear Ron,
>
> Very interesting observations! I hadn?t looked at it that way yet. You probably already know, but in case you don't, John Sinclair has very interesting considerations on collocations, multi-word items and the like in his ?Corpus, Concordance, Collocation?. Not a very recent publication, but still very inspiring. I don't have the book here, but I think it's Chapter 8. In the dictionary I have been working on intermittently for the last 14 years (Portuguese-Spanish; one way; only translated examples) I reached the same conclusion to the point that headwords were actually reduced to the number of 5000. The fact that it?s also an electronic dictionary allows me not to be obliged to think about how I would go about if I had to classify the multi-word items in a paper dictionary. (Here also Sinclair?s brief remarks are seminal.) The never-sufficiently-praised (as Don Quixote would say) Oxford English-Spanish Dictionary was a milestone in that respect. (There could be an earlier one?) But
> in this dictionary many of the words in one language are simply not translated but ?used? in a multi-word item. Which is then translated. (heedless: heedless OF sth: heedless of the danger, the regiment ? haciendo caso omiso del peligro, el regimiento ?). Maybe multi-word items are predominant because that?s where common words acquire their meaning.
>
> As for Patrick Hanks? remarks on corpus, I totally agree, but I have a few observations. Everything depends on what you aim at with the examples of your dictionary (since this is what the corpus is used for). I was a critical observer at Cobuild 2 for a year. I started accepting what they called their ?orthodoxy?, of accepting all corpus material as evidence. I ended up being convinced that producing evidence on language is one thing, and writing a dictionary a different one. It?s difficult to make a good product if you?re at the same time trying to make theoretical point.
> I do think that, even allowing for some bias due to the almost exclusively written input of most corpora, that corpora reflect the real usage of words, on the condition that you know how to evaluate the data, i.e., introspectively. A great number of assertions, which I thought were beyond attack, crumbled when I started revising my own Portuguese-Spanish dictionary, and when I started learning Japanese a few years ago. In other words, when I started to confront my theoretical convictions with my practice as a language teacher and learner. If you want to teach a learner a word using not-meddled-with corpus examples, you need a lot of them. Say, twenty, depending on the case. And when you deal with a language pair of very different families -- Indo-european/Japanese ? you must forget about using ?natural (corpus) examples? and resort to ?grammatical examples? excluding exactly multi-word items.
> The main question is ?what do I want my dictionary to help with?? If the answer is ?describe the language?, it seems beyond any doubt that corpora can help, or even ?are? the dictionary. If the answer is ?teach a language?, which is what learner?s dictionaries presumably aim at, then the answer is not so clear. Monolingual dictionaries were invented, très tardivement, with a mixture of political and scientific intentions. They are the treasure room of knowledge on language. The aims of dictionaries with a concrete aim, foreign language dictionaries, are wholly different and have to take into account the wishes of the audience, also the unconscious ones. A vast topic.
>
>
> Philippe Humblé
> Universidade Federal de Santa Catarina (Brasil)
>
>
> At 21:18 3/02/2004, you wrote:
>
> One discovery (that has implications for us) was when I was trying to think
> of English example words for each domain in my list of semantic domains. I
> found that a high percentage were multi-word lexical items. In some domains
> I quickly ran out of single word entries, but could think of lots of
> phrases. This phenomenon was repeated in a couple of workshops for Bantu
> languages. The speakers were generating about 25% phrases.
>
> I presume (without a lot of data to back me up) that our dictionaries should
> have a goodly percentage of multi-word entries. A quick scan of Longman's
> Language Activator shows about 50% multi-word entries. Can anyone give
> figures for their dictionaries? Has anyone worked at identifying/generating
> multi-word lexical items in such a way that they can estimate the percentage
> of idioms vs single words in a language? I realize that there is a gradation
> from collocation to idiom, so that it may be difficult to draw a line.
>
> Ron Moe
> SIL, Uganda
>
> -----Original Message-----
> From: List Facilitator [<mailto:lexicography2004 at yahoo.com>mailto:lexicography2004 at yahoo.com]
> Sent: Monday, February 02, 2004 10:11 PM
> To: lexicographylist at yahoogroups.com
> Subject: [Lexicog] Interesting lexical discoveries
>
>
> What are one or two of the most interesting discoveries that stand out for
> you (plural) in any of the lexical research that you have done?
>
>
>
> Wayne Leman
> Cheyenne dictionary project
>
>
> Yahoo! Groups Sponsor
> ADVERTISEMENT
> <http://rd.yahoo.com/SIG=12cotkl7m/M=243273.4510124.5685162.1261774/D=egroupweb/S=1709195911:HM/EXP=1075925860/A=1750744/R=0/*http:/servedby.advertising.com/click/site=552006/bnum=1075839460612080>
>
>
> ----------
> Yahoo! Groups Links
> * To visit your group on the web, go to:
> *
> <http://groups.yahoo.com/group/lexicographylist/>http://groups.yahoo.com/group/lexicographylist/
> *
> * To unsubscribe from this group, send an email to:
> *
> <mailto:lexicographylist-unsubscribe at yahoogroups.com?subject=Unsubscribe>lexicographylist-unsubscribe at yahoogroups.com
> *
> * Your use of Yahoo! Groups is subject to the
> <http://docs.yahoo.com/info/terms/>Yahoo! Terms of Service.
>
Yahoo! Groups Links
To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list