[Corpora-List] Punctuation follow up
Grant, T.
tg21 at leicester.ac.uk
Wed Jan 12 10:18:26 UTC 2005
Many thanks to:
Eric Atwell
Christopher Brewster
Gaël Dias
Jane Edwards
Nancy Ide
Raf Salkie
& Dominic Widdows
The corpora I've been referred to are
BNC http://www.natcorp.ox.ac.uk/
The Susanne Corpus http://www.grsampson.net/Resources.html info: http://www.grsampson.net/RSue.html
and the American National Corpus (through its XML coding)
There were also various comments suggesting that many corpora coded punctuation but retrieval could be tricky, methods were generally corpus specific but one more general suggestion was using the Java BreakIterator Class.
Cautions and comments included watching out for differences between spoken and written English and also American and British English.
For readings I was referred to:
Quirk, et al. A Comprehensive grammar of the English language.
London ; New York : Longman, 1985. x, 1779 p. : 26 cm.
&
Parkes, M. B. (Malcolm Beckwith) Pause and effect : an introduction to the history of punctuation in the West / M.B. Parkes.Berkeley : University of California Press, c1993.
Actually identifying the use of scare quotes from all the other uses of ' & " marks is tricky but I'm getting a high proportion using a single word separator between marks e.g { " _ " }
Thank you again
Tim
______________________________________
Tim Grant
Forensic Section - School of Psychology
University of Leicester
106 New Walk
Leicester LE1 7EA
UK
TG21 at leicester.ac.uk
http://www.le.ac.uk/psychology/tg21/
+ 44(0)116 252 3658 (Direct Line) - + 44(0)116 252 2451 (Secretary) - + 44(0)116 252 3994 (Fax)
More information about the Corpora
mailing list