[Corpora-List] Q: morpheme lexicons on WWW?

Mike Maxwell maxwell at ldc.upenn.edu
Wed Oct 5 00:22:16 UTC 2005


Eric Atwell wrote:
 > Can anyone recommend a source of morpheme lexicons/dicitonaries findable
 > on WWW, covering a wide range of languages? Basically a list of all
 > morphemes for  each language,

Isn't what you really want a list of allomorphs--or more precisely, 
allographs?  E.g. for English, not just the suffix -s, but its allograph 
-es; and not just the root 'try', but also its allograph 'tri' or 'trie' 
(as in 'he tries too hard'; note that the morpheme boundary here is 
unclear, although from their example "invited" -> invit-ed rather than 
*invite-d, I would assume the 'tri-ed' segmentation is the one they're 
looking for).  Likewise for Turkish, since vowel harmony is represented in 
the orthography (similarly for Finnish, I _think_).

For Turkish, years ago Jorge Hankamer wrote a morphological parser in C 
which had a large list of roots and affixes.  I have no idea whether he 
ever put that in the public domain.

BTW, I thought the point of the MorphoChallenge was not just to infer the 
morpheme boundaries in words, but to infer the lists of morpheme 
themselves.  But the rules don't explicitly say that...
-- 
	Mike Maxwell
	maxwell at ldc.upenn.edu



More information about the Corpora mailing list