[Corpora-List] Criteria for an ESP Vocabulary List

Thu Apr 24 08:47:33 UTC 2008

Hi Gill
I think you need both word frequency lists and n-grams etc.
A high frequency word may occur in a lot of lower-frequency patterns, so if you only look at n-grams, you might miss the word, because it did not feature in a high-frequency pattern…?

Best
Ramesh
________________________________
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Gill Philip
Sent: 24 April 2008 09:01
To: corpora at uib.no
Subject: Re: [Corpora-List] Criteria for an ESP Vocabulary List

dear Mahmud and list,

I agree with Adam here that vocabulary lists (aka word-lists) are probably not the best place to start from, whether for ESP or for ESL in general. The main reason for this is that, as we all know, the most frequent words are also the most polysemous and occur in the widest variety of structures. Certainly, collocations listings would be a more profitable place to start, as collocates limit the extent of the polysemes' meanings and behaviour; and generation of 3- 4- and 5-grams, skip-grams and concgrams would give a very good indication of what phraseology to include in the ESP syllabus. You could then extract vocabulary lists from those n-grams, focusing on the forms (and hence the meanings) that are actually used in the sector you're interested in.
This focus on meanings, rather than word forms, is discussed with reference to the AWL here:
K. Hyland & P. Tse (2007) "Is there an academic vocabulary?", Tesol Quarterly 41 (2) pp. 235-253

Food for thought.

best,
Gill

On 24/04/2008, Adam Turner <adam.turner at gmail.com<mailto:adam.turner at gmail.com>> wrote:

With regards to ESP and ESL vocabulary learning, I am not sure that frequency is always the best guide for compiling such lists and may not be as useful to instructors as some researchers may think. As a classroom instructor, I find that corpora research for education divorced from the context and genre sometimes has limited uses. Unsuprisingly, I find that the most frequent words are the ones that students already know. I read a highly detailed article on a concordance analysis of engineering English with all kinds of frequency statistics but very little of interest to me as an instructor despite the amount of work it took to create the article. In other words, when we move from description to teaching, the value of corpora analysis may differ.

More useful, however, are collocation lists or lexical chunks within a specific genre of writing and/or field of research. A word like method may appear frequently but almost all students know this word. They may not, however, be able to exploit all the collocations/frames/lexical chunks/fixed expressions where it might occur in an ESP context:

Collocation: novel/proposed/innovative/alternative/ -method.
Chunk/frame example with comparative:   In contrast to the conventional method A, our proposed method B improves accuracy/reliability by C .....

In addition, I think it would be more valuable to have the students work on and select the vocabulary words that they don't know than to have the instructor prepare them for the students using electronic corpora or not.  I don't have the reference handy, but recent work has cast some doubt on the universality of even well know academic word lists when searching across disciplines. The example of words like "stress" being used very diferently in mechanical or civil engineering than in other fields is an oft cited example.

I think it would depend on what kind of corpus you have and how specific your audience is. If it is not too specific, you could get a lot of use out of Google Scholar discipline specific searches to roughly gauge frequency, for example. If it is highly specific corpus then students could probably just select their own words from readings and give them to you.

My students want to know not how frequent a word is but whether or not that particular word is the appropriate one for that particular sentence in the context of the paragraph they are writing in the context of the genre of writing they are doing.

I am however very convinced of the benefits of combining genre analysis and concordancing, and of the value of examining how collocations and lexical chunks behave within a particular genre and register of a type of writing in a particular field. After teaching engineering writing classes, I do wonder however whether there is such a thing as say "mechanical" engineering in terms of the language of a discipline as the field can be quite diverse from automotive engineering to CAD design to fluid dynamics. This goes back to the old debate over how specific ESP teaching should be.

Even when vocabulary work is well informed by corpora, we often still get these fill in the blank types of exercises from them that all teachers know only test lower level recognition and passive vocabulary and that the students can't always activate in productive skills like writing and speaking. We all do them because they are easy to make and score.

Continuing a previous thread on the relationship between research and teaching, I would be interested in hearing researcher's perspectives on why it is so important to concentrate on frequency so much. I think more useful work could be done at the phrase/frame/chunk level for classroom applications.

Adam

On Thu, Apr 24, 2008 at 2:41 PM, True Friend <true.friend2004 at gmail.com<mailto:true.friend2004 at gmail.com>> wrote:

Hi
I am working on a project of ESP. I have to generate vocabulary lists. What is the best criteria to generate vocabulary list? Frequency or the Range (occurance in number of files in corpus, or how wide the word is used in corpus)? Keyword generators work on the basis of frequency i.e. antconc and wordsmith tools etc. They generate a list by comparing with reference corpus a list of words having more frequency in specialized corpus and less in reference corpus. Frequency basis is fine but Range has its importance i.e. if a word is most frequent but used only in 10 files is less important then a less frequent word found in more files. So what are your suggestions.  Personally I'll prefer frequency because there is no software available to generate keywords on the basis of Range or Ranking, or to arrange the words from a list on the basis of their Range (i.e. more range will have number 1 and so on).
Regards
--
محمد شاکر عزیز

_______________________________________________
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora

--
Adam Turner

Director
English Writing Lab
Hanyang University
Center for Teaching and Learning
Seoul, Korea
http://ctl.hanyang.ac.kr/writing/
_______________________________________________
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora

--
*********************************
Dr. Gill Philip
CILTA
Università degli Studi di Bologna
Piazza San Giovanni in Monte, 4
40124 Bologna
Italy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080424/f7536566/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora