[Corpora-List] Python Benchmarks Using Lots of Memory

Adam Radziszewski kocikikut at gmail.com
Mon Nov 15 09:30:49 UTC 2010


Dear Carl,
we've got such a application — Python implementation of
morpho-syntactic tagger and simple NP chunker for Polish. The software
ideally meets the requirements: it takes awful memory loads and is
pretty slow ;) and it's useful as testbed for tweaking with the
tagging algorithm and classifiers. The shortcoming is that the system
has got many dependencies and one of them relies on C code (the Orange
ML suite), so it may be a bit complicated to figure out where the
bottleneck is. I tried firing profiler and it seems that huge time is
spent on pickling/unpickling of the data.

Nevertheless, if you are interested, I'd be glad to have the code tested.

Here's the link:
http://nlp.pwr.wroc.pl/trac/private/disaster/

The Trac contains instructions on installation and that “how to
reproduce…” manual which could be the thing to run.

Best,
Adam Radziszewski

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list