[Corpora-List] Reducing n-gram output

Michael Maxwell maxwell at umiacs.umd.edu
Tue Oct 28 14:11:14 UTC 2008


Justin Washtell wrote:
> ...If you start at the character level, rather than the word level,
> then you get morphological analysis for free!

Well, morphological analysis is a little more complicated than that :-). 
For one thing, there are plenty of very common substrings that are not
morphemes.

A lot of work has been done on learning morphology from corpora; one place
to start is with the work by John Goldsmith and his students on
Linguistica.

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list