[Corpora-List] fast string replacement

Jörg Schuster joerg.schuster at gmail.com
Fri Mar 11 16:17:49 UTC 2005


> Two further questions:
> 
> - What exactly do you mean by "fast"?

I mean really REALLY fast. The size of my rewriting dictionary is 1
million lines at the moment. (But it will grow larger). The size of my
corpus is 80GB. And I would like to be able to tag often.

> - Do you mean string replacement (arbitrary substrings in a line of
> text) or word replacement?

String replacement. I use to make the dictionary such that only true
lexemes are tagged -- be they single words or multi word units.

> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and
> Steve Abney's cascaded parser CASS (you'll have to search Google for
> the source code).

I will try this. Thank you.

Jörg Schuster



More information about the Corpora mailing list