[Corpora-List] Frequency list of transformations

Viktor Pekar v.pekar at wlv.ac.uk
Fri Jan 21 09:39:06 UTC 2005


Hi Marijke,

Here is a Perl module that can tell which letters need to be
removed/inserted/substituted in one word to get the other:
http://cs.haifa.ac.il/~shlomo/talks/edit_distance/slides/Brew.pm.html

Viktor

----- Original Message -----
From: "Marijke Koster" <marijke at polderland.nl>
To: <CORPORA at UIB.NO>
Sent: Friday, January 21, 2005 8:44 AM
Subject: [Corpora-List] Frequency list of transformations


Dear corpora list members,

Does anyone have a suggestion for a simple method / a script to extract
a frequency list of transformations from a list of spelling errors and
corrections?

For example here's this tab separated list:

wrong      correct
-----      -------
occurence  occurrence
occosion   occasion
commputer  computer
live       life
heavie     heavy
geat       great
save       safe

After applying the method it should result in something like this
1 rr -> r
1 a  -> o
1 m  -> mm
2 f  -> v
1 y  -> ie
1 r  -> ()

Thanks in advance,
Marijke Koster
______________________________________
Marijke Koster, linguistic engineer
Polderland Language & Speech Technology BV
The Netherlands
http://www.polderland.nl
Phone: +31.24.352 28 66
Fax:   +31.24.352 28 60



More information about the Corpora mailing list