German MOR

RSteinkrauss r.steinkrauss at web.de
Tue Oct 4 08:15:43 UTC 2011


Brian,

I don't know the German CHILDES corpora equally well, but I have
worked a lot with the Leo corpus submitted by Heike Behrens. The
corpus is quite dense and has the advantage of being transcribed in
standard orthography, except for some few forms that were transcribed
faithfully to what was said. A drawback would be that the corpus does
not have umlauts and sharp s but ae and ss.
Still, I think that being able to tag that corpus with a high degree
of accuracy would be a big step, also for other German corpora.
Monika Schmid has offered another corpus of 150 adults which has ben
tagged morphologically and disambiguated by hand. Maybe it would be a
good idea to include that corpus in order to train / teach the tagger
and build a lexicon?

Rasmus


On 2 Okt., 18:45, Brian MacWhinney <m... at cmu.edu> wrote:
> Rasmus,
>      In the MOR-POST framework, the task of distinguishing a nominalization such as "Wissen" in "das Wissen" from the verb wissen is handled by a trained statistical disambiguator.  This disambiguator does not yet exist for German, largely because basic part-of-speech tagging is still inaccurate and incomplete.  Once we get a proper MOR, the next step is to get a gold corpus and to tag that by repeated bootstrapping.  Right now, I could use help mostly in the area of lexicon building.  If we could settle on an initial corpus to tag, that would be the first step.  Then we run mor +xl on the corpus and start to add missing words.  Which of the corpora in CHILDES would be a good target?  Corpora that use standard target-language word forms would make things easier.
>
> --Brian MacWhinney
>
> On Oct 1, 2011, at 4:11 AM, RSteinkrauss wrote:
>
>
>
>
>
>
>
> > Brian,
>
> > thanks for your reply. Kevin's idea seems like a promising approach,
> > and I am glad you want to give it a try! While I wouldn't really
> > object to not capitalizing common nouns, this would introduce many
> > ambiguities which would have to be resolved in a second step, like
> > distinguishing adjectives/verbs from their nominalizations or from
> > other nouns with the same form, so this would not be ideal either.
> > Thank you very much for investing time into this - I would be glad to
> > hear how we can help!
>
> > Rasmus
>
> > On 30 Sep., 21:23, Brian MacWhinney <m... at cmu.edu> wrote:
> >> Kevin,
>
> >> Yes, that is a possible approach.  In fact, MOR works this way already.  If a common noun is in the lexicon in capitalized form, it gets recognized.  If it is not in the lexicon, it is treated as proper.   Then, as you suggest, the next step is to run a FREQ on the %mor line to check all the proper nouns to see if any commons have slipped through.  This is a bit messier than the process for the other languages, but I guess it is doable and we end up with stuff that at least looks like German.  I will give it a try.
>
> >> -- Brian
>
> >> On Sep 30, 2011, at 2:29 PM, Kevin Donnelly wrote:
>
> >>> Hi
>
> >>> ::::On Friday 30 September 2011 Brian MacWhinney said::::
> >>>> German is the only language that capitalizes these and it makes the tagging
> >>>> job more difficult, because then one cannot readily distinguish proper
> >>>> from common nouns.  This then means that you have to list all proper
> >>>> nouns, which is a big job that could never be close to complete.
>
> >>> Would it not be possible to look up a capitalised word, and return the noun if
> >>> it exists in the lexicon, and something like "name" or "proper" if it doesn't?  
> >>> Then you would only have to check the names, and add them to the lexicon where
> >>> they were in fact nouns.  That is how we're doing it in the autoglosser, but
> >>> maybe that approach would not fit in with the MOR architecture?
>
> >>> --
> >>> Pob hwyl / Best wishes
>
> >>> Kevin Donnelly
> >>> kevindonnelly.org.uk
>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups "chibolts" group.
> >>> To post to this group, send email to chibolts at googlegroups.com.
> >>> To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
> >>> For more options, visit this group athttp://groups.google.com/group/chibolts?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups "chibolts" group.
> > To post to this group, send email to chibolts at googlegroups.com.
> > To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/chibolts?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.



More information about the Chibolts mailing list