<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Dear Ciarán,<br>

    <br>

    as Heike already said, SMOR might be interesting for you.<br>

    SMOR should be able to solve most of the problems you mentioned.

    Here are some examples:<br>

    <br>

    > Aufsteigen<br>

auf<VPART>steigen<V><SUFF><+NN><Neut><Nom><Sg><br>

auf<VPART>steigen<V><SUFF><+NN><Neut><Dat><Sg><br>

auf<VPART>steigen<V><SUFF><+NN><Neut><Acc><Sg><br>

    // nominalisation of a particle verb<br>

    <br>

    > verkleinertes<br>

verkleinern<V><PPast><SUFF><+ADJ><Pos><Neut><Nom><Sg><St><br>

verkleinern<V><PPast><SUFF><+ADJ><Pos><Neut><Acc><Sg><St><br>

    // adjectivisation of a past participle<br>

    <br>

    > Ähnliches<br>

ähnlich<ADJ><SUFF><+NN><Neut><Nom><Sg><St><br>

ähnlich<ADJ><SUFF><+NN><Neut><Acc><Sg><St><br>

    // nominalisation of an adjective<br>

    <br>

    > Morphologiesysteme<br>

Morphologie<NN>System<+NN><Neut><Dat><Sg><Old><br>

Morphologie<NN>System<+NN><Neut><Nom><Pl><br>

Morphologie<NN>System<+NN><Neut><Gen><Pl><br>

Morphologie<NN>System<+NN><Neut><Acc><Pl><br>

    // compound<br>

    <br>

    You could even approach the separable verb prefix problem by

    attaching the separated prefix to the verb and analysing it. Take

    the sentence "Er schlägt das Buch auf". You extract "schlägt" and

    "auf" and analyse the recombined wordform:<br>

    > aufschlägt<br>

auf<VPART>schlagen<+V><3><Sg><Pres><Ind><br>

    <br>

    SMOR is not freely available yet, but you can obtain a free research

    license.<br>

    <br>

    Best regards,<br>

      Helmut Schmid<br>

    <br>

    <br>

    Am 16.01.2012 22:07, schrieb Ciarán Ó Duibhín:

    <blockquote

      cite="mid:7AF24178F3D847EEAE6BD0656C5099F9@InneallChiarin"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <meta content="MSHTML 6.00.6001.18639" name="GENERATOR">

      <style></style>

      <div><font face="Arial" size="2">Are there any lemmatized corpora

          of German, which can be used queried on-line or on Windows? 

          I'm trying to lemmatize some German text myself for lexical

          purposes, and I would like to see how others have handled the

          problems, and how well it works.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">Of the German corpora I have

          found, Negra is POS-tagged but not lemmatized, while Tiger is

          both POS-tagged and lemmatized.  Negra does not mention

          any query</font><font face="Arial" size="2"> facility; Tiger

          had one which is no longer supported and unfortunately doesn't

          work for me.  A problem for me with both these corpora is

          that the tagset they use (STTS) seems to be designed with

          syntax in mind.  Here are some examples where this may not

          suit my lexical purposes.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">1. The various finite forms of a

          verb (eg. aufsteigen) are lemmatized to the infinitive and

          tagged VVFIN, whereas the abstract noun (das Aufsteigen) is

          tagged NN.  I think I would like to be able to retrieve them

          all together, eg. in response to "aufsteigen".</font></div>

      <div> </div>

      <div><font face="Arial" size="2">2. Present participles and past

          participles are tagged as adjectives (ADJA or ADJD). I think I

          would like to retrieve these too from the verbal infinitive.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">3. Substantivised adjectives are

          tagged as nouns (eg etwas Ähnliches).  I think I would like

          these retrieved along with the forms of the adjective

          (ähnlich).</font></div>

      <div> </div>

      <div><font face="Arial" size="2">4. Separable verbs are tagged as 

          two words when separated and as one word when not separated. 

          I think I would like to retrieve separated and nonseparated

          examples together, though I have not decided whether this is

          best done by tagging them all as one word or as two.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">5. Compound forms are not

          decompounded.  I think I would like to decompound (most of)

          them.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">Although my interest is in

          lemmas, it is sometimes useful for me to have POS-tags also,

          eg. to distinguish arm-ADJ from Arm-NN.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">I have run my text through

          TreeTagger, using the training data for STTS, and expect to

          have to make the above changes manually.  Before committing

          myself further, I'd like to try out anything which already

          exists, or to receive any advice.</font></div>

      <div> </div>

      <div><font face="Arial" size="2">Many thanks,<br>

          Ciarán Ó Duibhín.</font></div>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>

Corpora mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>