[Corpora-List] Q: morpheme lexicons on WWW?
Mike Maxwell
maxwell at ldc.upenn.edu
Wed Oct 5 00:22:16 UTC 2005
Eric Atwell wrote:
> Can anyone recommend a source of morpheme lexicons/dicitonaries findable
> on WWW, covering a wide range of languages? Basically a list of all
> morphemes for each language,
Isn't what you really want a list of allomorphs--or more precisely,
allographs? E.g. for English, not just the suffix -s, but its allograph
-es; and not just the root 'try', but also its allograph 'tri' or 'trie'
(as in 'he tries too hard'; note that the morpheme boundary here is
unclear, although from their example "invited" -> invit-ed rather than
*invite-d, I would assume the 'tri-ed' segmentation is the one they're
looking for). Likewise for Turkish, since vowel harmony is represented in
the orthography (similarly for Finnish, I _think_).
For Turkish, years ago Jorge Hankamer wrote a morphological parser in C
which had a large list of roots and affixes. I have no idea whether he
ever put that in the public domain.
BTW, I thought the point of the MorphoChallenge was not just to infer the
morpheme boundaries in words, but to infer the lists of morpheme
themselves. But the rules don't explicitly say that...
--
Mike Maxwell
maxwell at ldc.upenn.edu
More information about the Corpora
mailing list