[Corpora-List] Punctuation follow up

Grant, T. tg21 at leicester.ac.uk
Wed Jan 12 10:18:26 UTC 2005


Many thanks to:

Eric Atwell
Christopher Brewster 
Gaël Dias
Jane Edwards
Nancy Ide
Raf Salkie
& Dominic Widdows

The corpora I've been referred to are

BNC http://www.natcorp.ox.ac.uk/
The Susanne Corpus http://www.grsampson.net/Resources.html info: http://www.grsampson.net/RSue.html
and the American National Corpus (through its XML coding)

There were also various comments suggesting that many corpora coded punctuation but retrieval could be tricky, methods were generally corpus specific but one more general suggestion was using the Java BreakIterator Class.

Cautions and comments included watching out for differences between spoken and written English and also American and British English.  

For readings I was referred to:
Quirk, et al.  A Comprehensive grammar of the English language.
     London ; New York : Longman, 1985.  x, 1779 p. : 26 cm.
&
Parkes, M. B. (Malcolm Beckwith) Pause and effect : an introduction to the history of punctuation in the West / M.B. Parkes.Berkeley : University of California Press, c1993.

Actually identifying the use of scare quotes from all the other uses of ' & " marks is tricky but I'm getting a high proportion using a single word separator between marks e.g { " _ " }

Thank you again

Tim

______________________________________
Tim Grant
Forensic Section - School of Psychology
University of Leicester
106 New Walk
Leicester LE1 7EA
UK

TG21 at leicester.ac.uk
http://www.le.ac.uk/psychology/tg21/

+ 44(0)116 252 3658 (Direct Line) - + 44(0)116 252 2451 (Secretary) - + 44(0)116 252 3994 (Fax)



More information about the Corpora mailing list