[Corpora-List] Punctuation

Eric Atwell eric at comp.leeds.ac.uk
Tue Jan 11 16:56:33 UTC 2005


Tim,
most English corpora since pioneering Brown and LOB in 1960s have
included punctuation, so any of these might do.
The British National Corpus from 1990s has the advantage of www-based
trail search, you can "try before you buy" at
http://sara.natcorp.ox.ac.uk/lookup.html

For example I tried search term {'|"}
- regular expression finding all occurrences of ' or "
(usage depends on original sources so there is no corpus-wide
  standardised punctuation)

I'm not sure how to identify all and only scare quotes via such regular
expressions... good luck!

Eric Atwell, school of Computing, Leeds University


On Tue, 11 Jan 2005, Grant, T. wrote:

> I'm looking for a freely accessible English language corpus which allows analysis of punctuation marks - I'm interested for example in examining the use of scare quotes.
>
> Any ideas gratefully received.
>
> Tim
>
> ______________________________________
> Tim Grant
> Forensic Section - School of Psychology
> University of Leicester
> 106 New Walk
> Leicester LE1 7EA
> UK
>
> TG21 at leicester.ac.uk
> http://www.le.ac.uk/psychology/tg21/
>
> + 44(0)116 252 3658 (Direct Line) - + 44(0)116 252 2451 (Secretary) - + 44(0)116 252 3994 (Fax)
>
>
>

--
Eric Atwell, Senior Lecturer, Computer Vision and Language research group,
School of Computing, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-2335430  FAX: +44-113-2335468  http://www.comp.leeds.ac.uk/eric



More information about the Corpora mailing list