[Corpora-List] Reducing n-gram output
J Washtell
lec3jrw at leeds.ac.uk
Tue Oct 28 22:18:31 UTC 2008
Quoting maxwell at umiacs.umd.edu:
> Justin Washtell wrote:
>> ...If you start at the character level, rather than the word level,
>> then you get morphological analysis for free!
>
> Well, morphological analysis is a little more complicated than that :-).
> For one thing, there are plenty of very common substrings that are not
> morphemes.
Yes, and language is nothing if not exceptions, but it is a remarkably
good start for such a simple rule. It is telling that many of the
participants of MorphoChallenge etc, do take a "compactness" approach,
with considerable success - see Creutz & Lagus (2006) "Morfessor". But
yes - point taken - "morphological analysis for free" should really
come with health warnings.
Justin Washtell
University of Leeds
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list