[Corpora-List] Python Benchmarks Using Lots of Memory

Carl Friedrich Bolz cfbolz at gmx.de
Fri Nov 12 18:53:28 UTC 2010


Dear Yannick,

On 11/12/2010 07:14 PM, Yannick Versley wrote:
> Most of my own software that I use for CPU- or memory-intensive computation
> uses bits of Cython in them (aka it would be awesome if PyPy could talk to
> cpdef functions in Cython modules and automagically optimize away the
> boxing/unboxing at the PyPy/Cython boundary),

I guess this is getting off-topic for the list, but of course the hope 
is that with PyPy you don't actually need Cython often because it's fast 
enough :-). Not that we are quite there yet...

> but here are two
> examples of code that will probably fit your bill in that it can read
> in data and
> will use more memory when you feed it with more data:
>
> * The DECCA toolkit looks at sequences of POS tags and word sequences
> http://decca.osu.edu/software.php

Yes, this looks very good. Will look into it. Any quick pointers for a 
corpus I could use?

[...]
> NLTK is probably a very good testbed for using PyPy on it since
> * it comes with its own data, so there's no need to hunt for datasets
> or produce synthetic data
> * it's actually written with clarity in mind and probably contains less
> squeezing-the-last-drops-of-performance-out-of-CPython code than
> other projects.

Oh yes, thanks, I had already planned to look into NLTK more.

Cheers,

Carl Friedrich

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list