[Corpora-List] fast string replacement
Jörg Schuster
joerg.schuster at gmail.com
Fri Mar 11 16:17:49 UTC 2005
> Two further questions:
>
> - What exactly do you mean by "fast"?
I mean really REALLY fast. The size of my rewriting dictionary is 1
million lines at the moment. (But it will grow larger). The size of my
corpus is 80GB. And I would like to be able to tag often.
> - Do you mean string replacement (arbitrary substrings in a line of
> text) or word replacement?
String replacement. I use to make the dictionary such that only true
lexemes are tagged -- be they single words or multi word units.
> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and
> Steve Abney's cascaded parser CASS (you'll have to search Google for
> the source code).
I will try this. Thank you.
Jörg Schuster
More information about the Corpora
mailing list