[Corpora-List] FRequency count of all n word phrases in a text
Maarten van Gompel
proycon at anaproy.nl
Thu Jan 2 18:17:01 UTC 2014
On Wed, Jan 01, 2014 at 02:49:45PM +0000, Stone, Dan wrote:
>I’m looking for easy-to-use software (or code, if necessary) to search for all n word phrases in large texts. Any suggestions?
Hi Dan,
I've developed software to do precisely that, extraction of n-grams as
well as patterns that are not consecutive (skipgrams). The software is
written in C++ for speed and memory efficiency but comes with a Python
binding for usage from Python script. It also has a standalone CLI
tool that can do what you want.
See https://github.com/proycon/colibri-core and
http://proycon.github.io/colibri-core/doc/ for the documentation.
Regards,
--
Maarten van Gompel
Centre for Language Studies
Radboud Universiteit Nijmegen
proycon at anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon
GnuPG key: 0x1A31555C XMPP: proycon at anaproy.nl
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list