Corpora: Summary of replies
Rodrigo Tadeu Gon�alves
acollon at ig.com.br
Wed May 29 17:12:47 UTC 2002
Hello,
I'm posting the summary of the answers I got from my last help message:
Kiril Simov <kivs at bultreebank.org> wrote:
Dear Rodrigo,
Please check our system for corpora development CLaRK. We could
download it from:
http://www.BulTreeBank.org
and then CLaRK system link.
----------
Rita Carol Simpson <ritacsim at umich.edu> wrote:
Hello,
Our website on the MICASE corpus has some pages that deal specifically
with transcription and markup of a corpus of spoken English. These may
be useful to you insofar as they relate directly to issues involved in
corpus-building.
----------
Eric Atwell wrote:
Rodrigo,
Please could you summarise any replies you get and post this summary
back to the CORPORA list - this may be useful to others building
corpora, including students here at Leeds University.
I suggest one place to start is ICAME, the International Computer
Archive of Modern and medieval English, host of the CORPORA mailing list
and of the ICAME website http://www.hd.uib.no/icame.html
Info on the website which might help you includes Manuals for the corpora
distributed by ICAME; most include background info on how the corpora were
collected and tagged etc: http://khnt.hit.uib.no/icame/manuals/index.htm
ICAME also publishes ICAME Journal, with back issues online on the website;
ICAME Journal includes papers relevant to corpus building and tagging, you
could start with paper(s) on the language genre(s) you are interested in,
eg:
Alejandro Curado Fuentes, "Exploitation and assessment of a Business English
corpus through language learning tasks", ICAME Journal Vol.26 pp5-32, 2002
Norma Pravec, "Survey of learner corpora", ICAME Journal Vol.26 pp81-114,
2002
Ma Dolores Ramirez Verdugo, "Non-native interlanguage intonation
systems: a study based on a computerised corpus of Spanish learners of
English", ICAME Journal Vol.26 pp115-132, 2002
Claudia Claridge, "Causal Clauses in written and speech-related genres
in Early Modern English", ICAME Journal Vol.25 pp31-64, 2001
Eric Atwell, George Demetriou, John Hughes, Amanda Schiffrin, Clive
Souter and Sean Wilcock, "A comparative evaluation of modern English
corpus grammatical annotation schemes", ICAME Journal Vol.24 pp7-24, 2000
Merja Kytö, Juhani Rudanko and Erik Smitterberg, "Building a bridge
between the present and the past: A corpus of 19th-century English",
ICAME Journal Vol.24 pp85-98, 2000
Winnie Cheng and Martin Warren, "Facilitating a description of
intercultural conversations: the Hong Kong Corpus of Conversational English"
ICAME Journal Vol.23 pp5-20, 1999
Manfred Markus, "Getting to grips with chips and Early Middle
English text variants: sampling Ancrene Riwle and Hali Meidenhad",
ICAME Journal Vol.23 pp35-52, 1999
Arja Nurmi, "The Corpus of Early English Correspondence Sampler (CEECS)",
ICAME Journal Vol.23 pp53-64, 1999
Tobias Rademann, "Using online electronic newspapers in modern
English-language
press corpora: Benefits and pitfalls", ICAME Journal Vol.22 pp49-72, 1998
Minna Vihla, "Medicor: A corpus of contemporary American medical texts",
ICAME Journal Vol.22 pp73-80, 1998
Rainer Siemund and Claudia Claridge, "The Lampeter Corpus of Early Modern
English Tracts", ICAME Journal Vol.21 pp61-70, 1997
Gregory John Watson, "The Finnish-Australian English Corpus",
ICAME Journal Vol.20, pp41-70, 1996
Anneli Meurman-Solin, "A new tool: The Helsinki Corpus of Older Scots
(1450-1700)", ICAME Journal Vol.19, pp49-62, 1995
Roger Garside, "The marking of cohesive relationships: tools for the
construction of a large bank of anaphoric data",
ICAME Journal Vol.17 pp5-28, 1993
Merja Kytö and Matti Rissanen, "A language in transition: the Helsinki
corpus of English texts", ICAME Journal Vol.16, pp7-26, 1992
Elizabeth Green and Pam Peters, "The Australian Corpus project and
Australian English", ICAME Journal Vol.15 pp.37-54, 1991
Brian MacWhinney and Catherin Snow, "The Child Language Data Exchange
System CHILDES", ICAME Journal Vol.14 pp.3-25, 1990
Louis Milic, "A new historical corpus", ICAME Journal Vol.14, pp.26-39, 1990
Sidney Greenbaum, "The International Corpus of English",
ICAME Journal Vol.14 pp.106-108, 1990
Clive Souter, "The COMMUNAL project: extracting a grammar from the
Polytechnic of Wales Corpus", ICAME Journal Vol.13, pp.20-27, 1989
Nelleke Oostdijk, "A corpus for studying linguistic variation",
ICAME Journal Vol.12, pp3-14, 1988
Marion Owen, "Evaluating automatic grammatical tagging of text",
ICAME Journal Vol.11 pp.18-26, 1987
Pam Peters, "Towards a corpus of Australian English",
ICAME Journal Vol.11 pp.27-38, 1987
K Ahmad and G Corbett, "The Melbourne-Surrey Corpus",
ICAME Journal Vol.11 pp.39-43, 1987
Charles Meyer, "Punctuation practice in the Brown Corpus"
ICAME Journal Vol.10, pp.80-95, 1986.
Barbara Booth, "Revising CLAWS", ICAME Journal Vol.9 pp.29-35, 1985
Geoffrey Leech, Roger Garside and Eric Atwell, "The Automatic Grammatical
Tagging of the LOB Corpus", ICAME Journal Vol.7 pp.13-33, 1983
J M Gill, "The Gill Corpus", ICAME Journal Vol. 4 pp.7-8, 1980
Louis Milic, "The Augustan Prose Sample and the Century of Prose Corpus",
ICAME Journal Vol.4, pp.11-12, 1980
ICAME Journal also includes reviews and abstracts of books and other
publications relevant to corpus building and annotation, as "pointers"
to the wider research literature. However, NOTE that some of the
earlier papers cited above pre-date Windows-XP so the software may not
be readily re-usable on today's Windows-based PCs :)
Last by DEFINITELY not least, I recommend the searchable ICAME
bibliography database recently put online by Knut Hofland:
http://korpus.hit.uib.no/icame/bib_search.html
----------
I'd like to thank you all for helping me with the links and bibliography.
We're trying to start a project of encoding a corpus with graded texts for
Brazilian learners of English at the Federal University of Parana, as my
end-of-course monograph.
Thanks again for the attention,
Best wishes,
Rodrigo T. Gonçalves
More information about the Corpora
mailing list