German MOR

RSteinkrauss r.steinkrauss at web.de
Thu Sep 29 07:40:49 UTC 2011


Hello,

we are trying to tag a German corpus morphologically with the German
MOR grammar from the website and are experiencing some problems with
that. Sometimes a word is not recognized although it is in the
lexicon, and different CLAN versions are yielding different results -
notably, older CLAN versions (Dec 2009) detect more than the newest
version.

For example, while the noun "Anfang" is part of the lexicon (file
n.cut), it is not recognized:
?|Anfang

When writing it with a lower-case letter (which is ungrammatical for
nouns in German), it is recognized - three times, once as a noun and
twice as a verb form:
v|anfangen^n|anfang&an#v|fangen
However, while "anfangen" is a verb in German, "anfang" is not an
existing form of that verb.

And, to give an example of the differences between versions, the older
CLAN version would add the gender &M to the noun:
v|anfangen^n|anfang&M^an#v|fangen
The newest version does not do this (see above).

Can anyone help us with this? We would be happy to invest time into
improving the German tagger, but we are not sure how to go about this
and would first like to sort out the errors in the existing MOR
grammar. Any hints are greatly appreciated!

On a related note: Is the info on the German minMOR grammar mentioned
in the file
http://childes.psy.cmu.edu/intro/stephany.pdf
still correct?

Thanks!
Rasmus Steinkrauss

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.



More information about the Chibolts mailing list