[Corpora-List] American and British English spelling converter

Eric Atwell eric at comp.leeds.ac.uk
Fri Nov 3 10:15:38 UTC 2006


It may not be obvious to CORPORA readers who don't know Martin Wynne, 
but this MUST have been a tongue-in-cheek comment! The underlying
message is that the BNC provides empirical evidence that many traditional
distinctions between US and UK English spelling and vocabulary are
breaking down, as both US and UK traditional spellings are
interchangeably accepted worldwide and even in Britain.
I wonder if American corpora eg ANC have evidence of British spellings?

I'm currently looking into which English dominates the World Wide Web:
British or American? I've collected a small web-as-corpus from UK and US
domains, to compare with other English web-as-corpus samples taken from about 
100 other national domains. Can anyone point me at other studies 
comparing/assessing uptake of British v American English on WWW
outside UK and USA?

thanks

Eric Atwell, Leeds University


On Thu, 2 Nov 2006, Martin Wynne wrote:

> If you find such a program, let us know, and we can run it over the BNC and 
> change the 5849 occurrences of 'realize' and inflected forms to 'realise' 
> etc., and otherwise correct British English to your preferred spellings ;)
>
> Martin Krallinger wrote:
>
>> Dear all,
>> 
>> I was looking for some simple tool (preferable in Python) which
>> is able to do automatic conversion of texts (or words) from
>> British English (UK) to American (US)  English and vice versa.
>> (Example:  realize <-> realise)
>> 
>> This seems to be an easy task, but I could not find any ready to use
>> stand alone tool capable of performing this task.
>> 
>> I want to integrate this application into an Information extraction system
>> which handles scientific literature.
>> 
>> I am also interested in references where aspects related to US/UK English
>> spelling has been analyzed in the context of information extraction, text
>> mining and terminology extraction.
>> 
>> Best regards,
>> 
>> 
>> Martin
>> 
>> 
>
>

-- 
Eric Atwell,
Senior Lecturer, Language research group leader, School of Computing,
Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-3435430  FAX: +44-113-3435468  http://www.comp.leeds.ac.uk/eric



More information about the Corpora mailing list