Corpora: Summary of academic corpora responses

Paul Thompson p.a.thompson at reading.ac.uk
Mon Jun 10 12:39:19 UTC 2002


Nearly a month ago I posted a set of questions to this list about the
uses of corpora in research into academic discourse. Many thanks to all
those who took the time to contact me with information, and comments - I
really appreciate the help!

Here is a summary of the responses:


Lynne Flowerdew referred me to her excellent review article,
"Corpus-based analyses in EAP" in J. Flowerdew (Ed.) Academic Discourse,
pp. 95-114. London: Longman (2002).

Beatriz Méndez built a corpus of 8 articles of a High Impact Factor
journal in the field of Radiology for her PhD thesis and used WordSmith
Tools to analyse the corpus. She investigated combinatorial patterns to
see how discourse is organised in medical articles.

John Swales is working on a new book, "Research genres--explorations and
applications" which makes use of Ken Hyland's 80-article corpus (cf, Ken
Hyland's book "Disciplinary Discourses") and of MICASE (the Michigan
Corpus of Spoken Academic English http://www.hti.umich.edu/m/micase/).
He also used these two corpora in the writing of the second edition of
"Academic writing for graduate students" (co-authored with Chris Feak).

Andreas Eriksson pointed my attention to studies of tense and aspect in
academic discourse, recommending an MA dissertation by Li Vinh Taylor
(online citation: http://wwwlib.umi.com/dissertations/fullcit/MQ58551)
as a good overview of studies in this field.

Monica Hill and Annie Mueller wrote about the 'Vocabulary for specific
disciplines' project that they are involved in, at the English Centre,
University of Hong Kong. Monica Hill is the principle grant holder for
the project.

Monica wrote:

"Some colleagues and I have been working on a vocabulary project to help
tertiary level students work on discipline specific vocabulary. We are
investigating Law, Social Work, Business/Economics, Engineering and
Medicine.  Each of us has developed our own corpus by scanning in the
text books the students use in Year 1 at university, relevant academic
articles from the university press, and some general newspaper/magazine
articles - so we have a variety of genres. Each of us has about 500,000
words in our corpus, so it certainly isn't exhaustive, but by putting
the corpus through a word frequency analyser, it provides us with the
basic words the students need to know for that discipline ...

... The word frequency analysis is based on Nation and Laufer's Lexical
Frequency Profile, details of which are on Nation's very informative
site at Victoria University, Wellington.
http://www.vuw.ac.nz/lals/staff/paul_nation/index.html The academic
words are from Coxhead's Academic Word List (also at LALS, Wellington).

Using the profiler, we can highlight the different levels of words that
we want to investigate.  We can compare word frequencies across texts,
search for the first 1,000 most frequent words, or second thousand,
academic words and 'off-list' words which are those that have not
already been included in the other lists.  From this last group of
words, we then identify which words are most relevant to our students'
needs and we are developing a text based  vocabulary learning programme
containing exercises to help students use the words appropriately."

Annie is looking at the "words 'engineers need to talk to each other'"
and has so far compiled a corpus of approx 300,000 words using text
books and journals given her, or recommended by lecurers in the faculty,
and several issues of an on-line journal. The corpus will be used to
give ideas about the words in use in an engineering context.

David Oakey wrote about his research work on lexical phrases in academic
writing (1998-2001) using the MicroConcord B excerpts of "academic"
prose available with MicroConcord, and the "academic writing" part of
the BNC v1.0 based on David Lee's categorisations. This work has been
written up for publication in the following 2 book chapters:

  * "A corpus-based study of the formal and functional variation of a
lexical phrase in different academic disciplines in English." in Reppen,
R., Biber, D., and Fitzmaurice, S. (forthcoming) (Eds.) Using Corpora to
Explore Linguistic Variation. New York: John Benjamins

  * "Lexical phrases for teaching academic writing in English: corpus
evidence." in Nuccorini, S. (forthcoming) (Ed.) ISP4 Proceedings (Title
not yet known). Peter Lang

Aquilino Sánchez sent information on the CUMBRE corpus, a 20 m. word
corpus of contemporary Spanish (up till 1996), not specifically an
academic corpus. Information can be found at: www.um.es/lacell

Paul

Original posting:

>>I am preparing to write a review article on the uses that have been
made of corpora in the study of academic discourse, such as in :
>>* research into the vocabulary or grammar of academic discourse
>>* rhetorical or discourse analysis of academic discourse
>>* the preparation of teaching materials for Language for Specific
Purposes (for example, EAP) courses
>>* the provision of data for students to investigate either in language
learning courses or in language study courses
>>* study of discourse varieties, or in cross-linguistic comparisons
>>
>>I’d like to ask members of the corpora list who have used corpora for
any of the above purposes (or know of others who have) the following
questions:
>>
>>What corpus/corpora did you work with? How was the corpus compiled?
What format is/was it in? Is it publicly available?
>>
>>How was the corpus analyzed / investigated? How many people,
approximately, have used the corpus (if it is possible to put a figure
on this)?
>>
>>When was the research / teaching done, and were there any end
products, such as software, books, journal articles?


******************************************************
Dr Paul Thompson
School of Linguistics and Applied Language Studies
Language Resource Centre
P. O. Box 241
The University of Reading
Reading RG6 6WB
Tel: 44 118 9316472
WWW: http://www.rdg.ac.uk/slals/
******************************************************



More information about the Corpora mailing list