Corpora: Student needs info

Eric Atwell eric at comp.leeds.ac.uk
Mon May 27 11:43:44 UTC 2002


Rodrigo,
Please could you summarise any replies you get and post this summary
back to the CORPORA list - this may be useful to others building
corpora, including students here at Leeds University.

I suggest one place to start is ICAME, the International Computer
Archive of Modern and medieval English, host of the CORPORA mailing list
and of the ICAME website http://www.hd.uib.no/icame.html

Info on the website which might help you includes Manuals for the corpora
distributed by ICAME; most include background info on how the corpora were
collected and tagged etc: http://khnt.hit.uib.no/icame/manuals/index.htm

ICAME also publishes ICAME Journal, with back issues online on the website;
ICAME Journal includes papers relevant to corpus building and tagging, you
could start with paper(s) on the language genre(s) you are interested in, eg:

Alejandro Curado Fuentes, "Exploitation and assessment of a Business English
corpus through language learning tasks", ICAME Journal Vol.26 pp5-32, 2002

Norma Pravec, "Survey of learner corpora", ICAME Journal Vol.26 pp81-114, 2002

Ma Dolores Ramirez Verdugo, "Non-native interlanguage intonation
systems: a study based on a computerised corpus of Spanish learners of
English", ICAME Journal Vol.26 pp115-132, 2002

Claudia Claridge, "Causal Clauses in written and speech-related genres
in Early Modern English", ICAME Journal Vol.25 pp31-64, 2001

Eric Atwell, George Demetriou, John Hughes, Amanda Schiffrin, Clive
Souter and Sean Wilcock, "A comparative evaluation of modern English
corpus grammatical annotation schemes", ICAME Journal Vol.24 pp7-24, 2000

Merja Kytö, Juhani Rudanko and Erik Smitterberg, "Building a bridge
between the present and the past: A corpus of 19th-century English",
ICAME Journal Vol.24 pp85-98, 2000

Winnie Cheng and Martin Warren, "Facilitating a description of
intercultural conversations: the Hong Kong Corpus of Conversational English"
ICAME Journal Vol.23 pp5-20, 1999

Manfred Markus, "Getting to grips with chips and Early Middle
English text variants: sampling Ancrene Riwle and Hali Meidenhad",
ICAME Journal Vol.23 pp35-52, 1999

Arja Nurmi, "The Corpus of Early English Correspondence Sampler (CEECS)",
ICAME Journal Vol.23 pp53-64, 1999

Tobias Rademann, "Using online electronic newspapers in modern English-language
press corpora: Benefits and pitfalls", ICAME Journal Vol.22 pp49-72, 1998

Minna Vihla, "Medicor: A corpus of contemporary American medical texts",
ICAME Journal Vol.22 pp73-80, 1998

Rainer Siemund and Claudia Claridge, "The Lampeter Corpus of Early Modern
English Tracts", ICAME Journal Vol.21 pp61-70, 1997

Gregory John Watson, "The Finnish-Australian English Corpus",
ICAME Journal Vol.20, pp41-70, 1996

Anneli Meurman-Solin, "A new tool: The Helsinki Corpus of Older Scots
(1450-1700)", ICAME Journal Vol.19, pp49-62, 1995

Roger Garside, "The marking of cohesive relationships: tools for the
construction of a large bank of anaphoric data",
ICAME Journal Vol.17 pp5-28, 1993

Merja Kytö and Matti Rissanen, "A language in transition: the Helsinki
corpus of English texts", ICAME Journal Vol.16, pp7-26, 1992

Elizabeth Green and Pam Peters, "The Australian Corpus project and
Australian English", ICAME Journal Vol.15 pp.37-54, 1991

Brian MacWhinney and Catherin Snow, "The Child Language Data Exchange
System CHILDES", ICAME Journal Vol.14 pp.3-25, 1990

Louis Milic, "A new historical corpus", ICAME Journal Vol.14, pp.26-39, 1990

Sidney Greenbaum, "The International Corpus of English",
ICAME Journal Vol.14 pp.106-108, 1990

Clive Souter, "The COMMUNAL project: extracting a grammar from the
Polytechnic of Wales Corpus", ICAME Journal Vol.13, pp.20-27, 1989

Nelleke Oostdijk, "A corpus for studying linguistic variation",
ICAME Journal Vol.12, pp3-14, 1988

Marion Owen, "Evaluating automatic grammatical tagging of text",
ICAME Journal Vol.11 pp.18-26, 1987

Pam Peters, "Towards a corpus of Australian English",
ICAME Journal Vol.11 pp.27-38, 1987

K Ahmad and G Corbett, "The Melbourne-Surrey Corpus",
ICAME Journal Vol.11 pp.39-43, 1987

Charles Meyer, "Punctuation practice in the Brown Corpus"
ICAME Journal Vol.10, pp.80-95, 1986.

Barbara Booth, "Revising CLAWS", ICAME Journal Vol.9 pp.29-35, 1985

Geoffrey Leech, Roger Garside and Eric Atwell, "The Automatic Grammatical
Tagging of the LOB Corpus", ICAME Journal Vol.7 pp.13-33, 1983

J M Gill, "The Gill Corpus", ICAME Journal Vol. 4 pp.7-8, 1980

Louis Milic, "The Augustan Prose Sample and the Century of Prose Corpus",
ICAME Journal Vol.4, pp.11-12, 1980


ICAME Journal also includes reviews and abstracts of books and other
publications relevant to corpus building and annotation, as "pointers"
to the wider research literature.  However, NOTE that some of the
earlier papers cited above pre-date Windows-XP so the software may not
be readily re-usable on today's Windows-based PCs  :)

Last by DEFINITELY not least, I recommend the searchable ICAME
bibliography database recently put online by Knut Hofland:

http://korpus.hit.uib.no/icame/bib_search.html


I hope this helps

Eric Atwell

--
Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk
--

On Sun, 26 May 2002, Rodrigo Tadeu Gonçalves wrote:

> Hi people,
>
> I'm looking for basic bibliography on corpus building, preferentially online
> materials (so far I have only good intentions and no knowledge) and
> Windows-based software for tagging and corpus building.
>
> Thanks in advance,
>
> Rodrigo T. Gonçalves
>
>
>
>



More information about the Corpora mailing list