[Corpora-List] fast string replacement

Rob Malouf rmalouf at mail.sdsu.edu
Fri Mar 11 17:32:21 UTC 2005


On Fri, 2005-03-11 at 07:28, Stefan Evert wrote:
> If you're really interested in string replacement (probably with some
> additional code to identify word boundaries), you should be looking at
> finite-state transducers. Two open-source solutions I know are Helmut
> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and
> Steve Abney's cascaded parser CASS (you'll have to search Google for
> the source code).

You should also consider Gertjan van Noord's FSA Utilities:

http://grid.let.rug.nl/~vannoord/Fsa/fsa.html

It can compile your transducers into Java or C code for portable and/or
efficient execution.

--
Rob Malouf <rmalouf at mail.sdsu.edu>
Department of Linguistics and Oriental Languages
San Diego State University



More information about the Corpora mailing list