[Corpora-List] Frequency list of transformations

Marijke Koster marijke at polderland.nl
Fri Jan 21 08:44:46 UTC 2005


Dear corpora list members,

Does anyone have a suggestion for a simple method / a script to extract
a frequency list of transformations from a list of spelling errors and
corrections?

For example here's this tab separated list:

wrong      correct
-----      -------
occurence  occurrence
occosion   occasion
commputer  computer
live       life
heavie     heavy
geat       great
save       safe

After applying the method it should result in something like this
1 rr -> r
1 a  -> o
1 m  -> mm
2 f  -> v
1 y  -> ie
1 r  -> ()

Thanks in advance,
Marijke Koster
______________________________________
Marijke Koster, linguistic engineer
Polderland Language & Speech Technology BV
The Netherlands
http://www.polderland.nl
Phone: +31.24.352 28 66
Fax:   +31.24.352 28 60



More information about the Corpora mailing list