[Corpora-List] Reducing n-gram output

J Washtell lec3jrw at leeds.ac.uk
Tue Oct 28 22:18:31 UTC 2008


Quoting maxwell at umiacs.umd.edu:

> Justin Washtell wrote:
>> ...If you start at the character level, rather than the word level,
>> then you get morphological analysis for free!
>
> Well, morphological analysis is a little more complicated than that :-).
> For one thing, there are plenty of very common substrings that are not
> morphemes.

Yes, and language is nothing if not exceptions, but it is a remarkably  
good start for such a simple rule. It is telling that many of the  
participants of MorphoChallenge etc, do take a "compactness" approach,  
with considerable success - see Creutz & Lagus (2006) "Morfessor". But  
yes - point taken - "morphological analysis for free" should really  
come with health warnings.

Justin Washtell
University of Leeds


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list