[Corpora-List] Reducing n-gram output
Michael Maxwell
maxwell at umiacs.umd.edu
Tue Oct 28 14:11:14 UTC 2008
Justin Washtell wrote:
> ...If you start at the character level, rather than the word level,
> then you get morphological analysis for free!
Well, morphological analysis is a little more complicated than that :-).
For one thing, there are plenty of very common substrings that are not
morphemes.
A lot of work has been done on learning morphology from corpora; one place
to start is with the work by John Goldsmith and his students on
Linguistica.
Mike Maxwell
CASL/ U MD
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list