Corpora: morphems

Markus Schulze max at linguistik.uni-erlangen.de
Tue Feb 22 13:22:34 UTC 2000


Dear Mrs. Mühlmeyer,

at the URL http://www.linguistik.uni-erlangen.de/LAPTDA/laptda.html,
you will find various list of allomorphs, morphemes and wordforms
exracted from eight corpora each of the size of one million
types. There are seven corpora of the domains computer science,
geography, law, medicine, sports, linguistics and economy as well as a
representative reference corpus. 

The morphemes were extracted with the morphological analyser DMM (see:
http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.en.html)
which was developed with MALAGA 
(see: http://www.linguistik.uni-erlangen.de/Malaga.en.html).

MALAGA is freely available for non-commercial use - and the DMM soon
will be, so that you will then be able to extract your own morpheme
lists from any corpus.

Hope that helps
Markus Schulze

----------------------------------------------------------------------
            Department   for  Computational Linguistics
            Markus Schulze                            
            Bismarckstr. 6      fon:  +49-9131-85-29252
            91054 Erlangen      fax:  +49-9131-85-29251
            http://www.linguistik.uni-erlangen.de/~max/ 
----------------------------------------------------------------------

AM> Dear Collegues,
AM> I'm looking for a list of morphems of German language, as complete as
AM> possible, the morphems as short as possible. For example:
AM> /zer/riss/en
AM> /mög/lich/keit/en
AM> /grübel/n
AM> /grübl/er/isch
AM> best regards
AM> Agnes Mühlmeyer-Mentzelbegin:vcard 
AM> n:Mühlmeyer-Mentzel;Agnes 
AM> tel;fax:030 838-55986
AM> tel;work:030 838-55723
AM> 



More information about the Corpora mailing list