Corpora: morphems
Markus Schulze
max at linguistik.uni-erlangen.de
Tue Feb 22 13:22:34 UTC 2000
Dear Mrs. Mühlmeyer,
at the URL http://www.linguistik.uni-erlangen.de/LAPTDA/laptda.html,
you will find various list of allomorphs, morphemes and wordforms
exracted from eight corpora each of the size of one million
types. There are seven corpora of the domains computer science,
geography, law, medicine, sports, linguistics and economy as well as a
representative reference corpus.
The morphemes were extracted with the morphological analyser DMM (see:
http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.en.html)
which was developed with MALAGA
(see: http://www.linguistik.uni-erlangen.de/Malaga.en.html).
MALAGA is freely available for non-commercial use - and the DMM soon
will be, so that you will then be able to extract your own morpheme
lists from any corpus.
Hope that helps
Markus Schulze
----------------------------------------------------------------------
Department for Computational Linguistics
Markus Schulze
Bismarckstr. 6 fon: +49-9131-85-29252
91054 Erlangen fax: +49-9131-85-29251
http://www.linguistik.uni-erlangen.de/~max/
----------------------------------------------------------------------
AM> Dear Collegues,
AM> I'm looking for a list of morphems of German language, as complete as
AM> possible, the morphems as short as possible. For example:
AM> /zer/riss/en
AM> /mög/lich/keit/en
AM> /grübel/n
AM> /grübl/er/isch
AM> best regards
AM> Agnes Mühlmeyer-Mentzelbegin:vcard
AM> n:Mühlmeyer-Mentzel;Agnes
AM> tel;fax:030 838-55986
AM> tel;work:030 838-55723
AM>
More information about the Corpora
mailing list