[Corpora-List] Software for ngram-document matrix

Nitin Madnani nmadnani at gmail.com
Mon Jan 31 15:14:17 UTC 2011


I have previously TMG (Text to Matrix Generator) with Matlab.
Although, Matlab is not ideal for string processing, it is pretty
useful if you want to do anything numerically interesting with the
matrix once you have it. Although, from a cursory examination and my
own personal use a few years ago, it looks like TMG does not support
n-grams only single words. However, it does have the kind of filtering
options you were asking about. Might be worth a look:

http://scgroup20.ceid.upatras.gr:8000/tmg/

On Mon, Jan 31, 2011 at 7:48 AM, Georgios Mikros <gmikros at isll.uoa.gr> wrote:
> Dear all,
>
> I am trying to find an open-source tool which will take as input a corpus of
> raw texts and produce ngram-document matrix with text file-names as raws and
> ngrams as columns. It would be nice if I could filter ngrams based on their
> frequency or using a stop list.
>
> Kind regards
>
> George Mikros
>
>
>
> -----------------------------------
>
> George K. Mikros
>
> Associate Professor
>
> Department of Italian Language and Literature School of Philosophy
> University of Athens Greece
>
> Tel.: +30 210 7277491
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>



-- 
Linguist, Desi Linguist
http://www.desilinguist.org

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list