[Corpora-List] American and British English spelling converter

Martin Krallinger martink at cnb.uam.es
Fri Nov 3 11:20:48 UTC 2006


Dear all,

Just to clarify the motivation behind my question (spelling conversion 
UK/US), I am actually not a linguist, but working in a cancer research 
center and I want to combine bioinformatics
tools with IE and text mining systems. I actually extracted the spelling 
example I used before from the PubMed database (maybe I did not choose 
the best example,..):

realize:
'By working toward team care, hospitals may achieve a successful 
intensivist model, and patients may realize the benefits of spending 
less for healthcare and living longer. '
[PMID:17077695]

realise:
'However, these experiences have also illuminated a number of critical 
challenges that will have to be addressed in the development of 
effective drugs across different cancers, to fully realise the potential 
of individualised molecular therapy.'
[PMID:17059381]

In life sciences people are interested in using ontologies (e.g. Gene 
Ontology), controlled vocabularies and information extraction systems to 
increase efficiency of information access. As the biomedical literature 
is written mainly in English but from different native speakers, most of 
the articles I suppose are either in UK or US English. (For a study of 
the effect of different native languages in the writing of biomedical 
literature, refer to: see Netzel et al 
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1319188).

This makes information extraction or mapping terms derived from existing 
biomedical ontologies quite challenging.

I want to use a spelling converter ONLY as a form to 'normalize' the a 
large collection of biomedical text for subsequent IE, IR, document 
categorization and term mapping and not for extensive lexical, 
grammatical and idiomatic analysis.

Best regards,

Martin Krallinger



>It would be a grave mistake to think that the only difference between
>British and American English is a few wayward spellings. There are
>considerable and extensive lexical, grammatical and idiomatic
>differences. The 1st and 3rd of those are more or less well known, but
>the grammatical differences never cease to surprise me. I'd be
>moderately interested to see what other examples corpora listers come up
>with (though no doubt they will also remind me that there are
>significant differences in usage between American dialects, not to
>mention Canadian etc)
>
>To give just one example of each:
>
>Lift vs elevator
>Have you got vs do you have
>Half four vs 4:30
>
>Harold Somers  
>
>  
>
>>-----Original Message-----
>>    
>>
>>>Martin Krallinger wrote:
>>>
>>>      
>>>
>>>>Dear all,
>>>>
>>>>I was looking for some simple tool (preferable in Python) which is 
>>>>able to do automatic conversion of texts (or words) from British 
>>>>English (UK) to American (US)  English and vice versa.
>>>>(Example:  realize <-> realise)
>>>>
>>>>This seems to be an easy task, but I could not find any 
>>>>        
>>>>
>>ready to use 
>>    
>>
>>>>stand alone tool capable of performing this task.
>>>>
>>>>I want to integrate this application into an Information 
>>>>        
>>>>
>>extraction 
>>    
>>
>>>>system which handles scientific literature.
>>>>
>>>>I am also interested in references where aspects related to US/UK 
>>>>English spelling has been analyzed in the context of information 
>>>>extraction, text mining and terminology extraction.
>>>>
>>>>Best regards,
>>>>
>>>>
>>>>Martin
>>>>
>>>>
>>>>        
>>>>
>>>      
>>>
>
>
>  
>



More information about the Corpora mailing list