[Corpora-List] FRequency count of all n word phrases in a text

Maarten van Gompel proycon at anaproy.nl
Thu Jan 2 18:17:01 UTC 2014


On Wed, Jan 01, 2014 at 02:49:45PM +0000, Stone, Dan wrote:
>I’m looking for easy-to-use software (or code, if necessary) to search for all n word phrases in large texts.  Any suggestions?

Hi Dan, 

I've developed software to do precisely that, extraction of n-grams as
well as patterns that are not consecutive (skipgrams). The software is
written in C++ for speed and memory efficiency but comes with a Python
binding for usage from Python script. It also has a standalone CLI
tool that can do what you want.

See https://github.com/proycon/colibri-core and 
http://proycon.github.io/colibri-core/doc/ for the documentation.                                                                                              
Regards, 

--

Maarten van Gompel
  Centre for Language Studies
  Radboud Universiteit Nijmegen

proycon at anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon

GnuPG key: 0x1A31555C  XMPP: proycon at anaproy.nl

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list