[Corpora-List] Morphological segmentation

John A Goldsmith ja-goldsmith at uchicago.edu
Thu Jan 27 14:17:36 UTC 2005


In connection with the Linguistica project (http://linguistica.uchicago.edu
<http://linguistica.uchicago.edu/>  , and
http://linguistica.uchicago.edu/alchemist.html  ), we are in the process of
building gold-standards of morphological segmentation in a common XML format
for a number of languages. Our concern is more with morphological
segmentation (and allomorphy) and less with tagging of morphosyntactic
features.

 

I would very much appreciate pointers to any lists of words, in any
language, with an indication of correct morphological segmentation, or
pointers to software that does a good job of accomplishing this in
particular languages. 

 

Some morphological parsers focus on providing lemmatization or
morphosyntactic features, like Namer’s FLEMM mentioned by Jean Véronis, as
far as I can tell; these do not help us with our task. In addition, since
our goal is to use these gold standards for testing, rather than for
training, accuracy is particularly important. 

 

I’ll post a summary of all responses I receive. Thanks very much!

 

John Goldsmith 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20050127/ca5bb687/attachment.htm>


More information about the Corpora mailing list