Corpora: Complete MICASE corpus available online

Rita Carol Simpson ritacsim at
Thu May 9 14:38:27 UTC 2002

The English Language Institute at the University of Michigan is pleased to
announce that phase one of the Michigan Corpus of Academic Spoken English
(MICASE) is now complete, and the corpus is available for research,
pedagogic applications, and general interest at
<>. MICASE consists of approximately 1.7
million words from 152 speech events recorded at the University of
Michigan between 1997 and 2001, covering a wide range of university spoken

The database is made freely available (with some restrictions on use of
the corpus for commercial purposes), through the above-mentioned website,
which has a custom-designed search engine allowing for a variety of
searches that take advantage of the detailed speaker and speech event
attributes encoded in the corpus. The ELI is deeply grateful to the
Digital Library Production Services of the University Library for their
generous support of this project.

Further information about MICASE is also available through the ELI
website, <>, where we give
background information, composition of the corpus in terms of
percentages/words for major speaker and speech event categories, details
about the transcription and markup scheme, and a list of MICASE-inspired
presentations and publications.

Developments planned for the second phase of the project include tagging
the corpus for part-of-speech, generating a word list of academic speech
and related lexical statistics, posting selected soundfiles on the
website, and the development of language teaching and testing materials,
as well as continued research on grammatical, lexical, and discoursal
features of academic speech.

The MICASE team currently includes Rita Simpson (project director), John
M.  Swales (faculty advisor), and Sarah Briggs (testing advisor). General
inquiries about the project can be directed to Rita <ritacsim at>
or John <jmswales at>.

More information about the Corpora mailing list