[Corpora-List] Reducing n-gram output

svetlana sheremetyeva linklana at yahoo.com
Tue Oct 28 10:15:26 UTC 2008


Hi, Irina
 
 I have just made a tool  for keyword extraction (LanA-Key)  which includes collapsing n-grams.  It outputs up to 4-grams, but it can be updated to any "n"
 
The tool can be downloaded for a 3 day free trial from
 
http://lanaconsult.com

Regards,
                    Svetlana Sheremetyeva
               

--- On Mon, 10/27/08, Dahlmann Irina <aexid at nottingham.ac.uk> wrote:

From: Dahlmann Irina <aexid at nottingham.ac.uk>
Subject: [Corpora-List] Reducing n-gram output
To: CORPORA at uib.no
Date: Monday, October 27, 2008, 1:07 PM

Dear all,

I was wondering whether anybody is aware of ideas and/or automated
processes to reduce n-gram output by solving the common problem that
shorter n-grams can be fragments of larger structures (e.g. the 5-gram
'at the end of the' as part of the 6-gram 'at the end of the
day')

I am only aware of Paul Rayson's work on c-grams (collapsed-grams).

Many thanks,

Irina Dahlmann
 
PhD student
School of English Studies
University of Nottingham
aexid at nottingham.ac.uk

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20081028/0019e844/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list