[Corpora-List] American and British English spelling converter
Eric Atwell
eric at comp.leeds.ac.uk
Fri Nov 3 10:15:38 UTC 2006
It may not be obvious to CORPORA readers who don't know Martin Wynne,
but this MUST have been a tongue-in-cheek comment! The underlying
message is that the BNC provides empirical evidence that many traditional
distinctions between US and UK English spelling and vocabulary are
breaking down, as both US and UK traditional spellings are
interchangeably accepted worldwide and even in Britain.
I wonder if American corpora eg ANC have evidence of British spellings?
I'm currently looking into which English dominates the World Wide Web:
British or American? I've collected a small web-as-corpus from UK and US
domains, to compare with other English web-as-corpus samples taken from about
100 other national domains. Can anyone point me at other studies
comparing/assessing uptake of British v American English on WWW
outside UK and USA?
thanks
Eric Atwell, Leeds University
On Thu, 2 Nov 2006, Martin Wynne wrote:
> If you find such a program, let us know, and we can run it over the BNC and
> change the 5849 occurrences of 'realize' and inflected forms to 'realise'
> etc., and otherwise correct British English to your preferred spellings ;)
>
> Martin Krallinger wrote:
>
>> Dear all,
>>
>> I was looking for some simple tool (preferable in Python) which
>> is able to do automatic conversion of texts (or words) from
>> British English (UK) to American (US) English and vice versa.
>> (Example: realize <-> realise)
>>
>> This seems to be an easy task, but I could not find any ready to use
>> stand alone tool capable of performing this task.
>>
>> I want to integrate this application into an Information extraction system
>> which handles scientific literature.
>>
>> I am also interested in references where aspects related to US/UK English
>> spelling has been analyzed in the context of information extraction, text
>> mining and terminology extraction.
>>
>> Best regards,
>>
>>
>> Martin
>>
>>
>
>
--
Eric Atwell,
Senior Lecturer, Language research group leader, School of Computing,
Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-3435430 FAX: +44-113-3435468 http://www.comp.leeds.ac.uk/eric
More information about the Corpora
mailing list