Corpora: word-formation systems

Anke Lüdeling aluedeli at uos.de
Wed May 8 12:52:55 UTC 2002


Dear list members,

last week I asked for information about computational morphology
systems that deal with word-formation. I received a number of helpful
replies - thank you very much.

I would like to thank the following colleagues

Antti Aarppe
Janne Bondi Johannessen
Rodolfo Delmonte
Sergei A. Koval
Kemal Oflazer
Dan Tufis
Alexander S. Yeh

Below I have summarized the responses I got plus the information on
word-formation systems that I already had by language. I have given
the url where available and commented where I could (I haven't yet had
the time to look at all the links and papers that were provided but I
will
certainly try to do so).

---
English

ALE-RA
http://nl.ijs.si/et/Thesis/ALE-RA/


"Alexander S. Yeh" wrote:

> UMLS (Unified Medical Language System?) is a U.S. Government program that
> provides among other things, a free morphological variation system for
> mainly English medical terms.


----
Finnish (& other languages)

Antti Arppe wrote:

> A Finnish language technology company, Lingsoft <www.lingsoft.fi> has
> used their morphological models (based on the two-level principle and
> model by Koskenniemi) for generating inflected word forms in
> inflecting thesauri, i.e. synonym dictionaries that can handle the
> inflected forms of the synonyms as well. The languages that were
> covered are Finnish, Swedish, Norwegian (bokmål), Danish and German.
>
> There's a short presentation on part of this in the Proceedings of the
> 17th Scandinavian Conference of Linguistics:  Arppe, Antti; Voipio,
> Mari; Würtz, Malene 2000. Creating Inflecting Electronic Thesauri. In
> Lindberg, Carl-Erik & Nordahl Lund, Steffen 2000. 17th Scandinavian
> Conference of Linguistics Odense Working Papers in Language and
> Communication, No. 19, Vol. I, Institute of Language and
> Communication, University of Southern Denmark.
>
> In the case of these software tools, the generation was geared for the
> (limited) synonym content. In principle the same models could be
> applied for the language as a whole, but there are a variety of
> factors that have to be considered in such a case, e.g. variant
> inflected forms and errors in the underlying linguistic model which
> become apparent only when generation is applied.
>
> Though I have been talking here mostly about inflection, specifically
> the Finnish model has had a version where both derivations and
> inflections can be generated from root words, e.g.
>
> ympäri+dv-oida+dn-minen+nom+sg > ympäröiminen
> around+verbalize+nominalize+nominative+singular > encirclement
>
> I believe that this could be adapted rather easily to the other
> languages as well, since they're all based on the same theoretical
> principle, i.e. the TWOL model which allows to be used for both
> morphological analysis and generation. Nevertheless, Lingsoft has not
> been otherwise very active regarding these tools, as far as I know.

Comment: I am familiar with GerTWOL, the German version of TWOL. A
link is given below.

----
German

DeKo (for Derivation und Komposition, IMS, University of Stuttgart;
this is the project I worked in :-)
http://www.ims.uni-stuttgart.de/projekte/DeKo

Projekt Deutscher Wortschatz (University of Leipzig):
http://wortschatz.uni-leipzig.de

Deutsche Malaga Morphologie (university of Erlangen):
http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.html

CISLEX (University of Munich):
http://www.cis.uni-,uenchen.de/projects/CISLEX:html

GerTWOL (Lingsoft Inc.): http://www.lingsoft.fi/cgi-bin/gertwol

and there is a German version of WordManager (University of Basel &
Canoo)
http://www.wordmanager.com

---
Italian

Rodolfo Delmonte wrote:
>
> As to the morphology word formation system, of course we have our
> system for Italian IMMORTALE) that generates/analyses derivations
> besides inflections. But no compound word, at least not yet. Even
> though we could regard cliticized verbs as a special type of compound
> word,
> - lasciamoglielo / (let's) leave it to him
> it requires clitic stripping and then inflection stripping, perhaps
> with derivation stripping too, in case the verb is not included in
> the dictionary  list.
> There's a number of published papers on it, they are listed in my website.
> website: http://project.cgm.unive.it

---
Norwegian

Janne Bondi Johannessen wrote:

> For Norwegian, we have a compound analyser that also analyses
> productive derivation as part of our morphological tagger. It can be
> tested at : http://decentius.hit.uib.no:8005/cl/cgp/test.html

---
Romanian

Dan Tufis wrote:
>
> For Romanian I can give you at least three examples:
> 1) Dan Cristea's morphological analyser/generator in the early 1980's
> 2) my PARADIGM morphology learning system
> (described in the EACL89 proceedings: "Tufis D. "It Would Be Much Easier If
> WENT Were GOED",
> in Harry Somers, Mary McGee Wood (eds.), Proceedings of the 4th EACL,
> Manchester, 1989, pp.145-152
> and in EACL91: Tufis D., Popescu O., "A Unified Management and Processing of
> Word-Forms, Idioms and Analytical Compounds", in Jurgen Kunze and Dorothy
> Reinman (eds.), Proceedings of the 5th EACL, Berlin, 1991, pp.95-100)
> 2) Dan Cristea's MICH classification-based system
> (described in Dan Cristea (1994): The Classification Language MICH, Research
> Report, LIMSI-CNRS, Universite Paris-Sud, Orsay.
> Dan Cristea (1993): The generation of Romanian Morphology. Research Report.
> University of Edinburgh).
>
> There is a new C-based PC-implementation of the LISP system 1) due to Stefan
> Andrei of University A.I. Cuza in Iasi
> (described in Andrei, St.: A Morphological Analyser for Romanian Language.
> The First EUROLAN Summer School
> in Natural Language Processing , Iasi - Romania, July 19-29, 1993)

---
Russian

"Sergei A. Koval" wrote:
>
> As for Russian, there is a system called RUSLO (abbreviated from the Russian
> "RUSskoye SLOvoobrazovaniye" = "Russian Derivation") developed by
> N.N.Pertsova, A.V.Cheremkhin, A.V.Rafaeva.
> Some details are available at
> http://194.226.57.46/uvk1838/Sciper/volume1/pertsova.htm

---
Turkish

Kemal Oflazer wrote:
>
> You may want to take a look at the morphological analyzer for Turkish
> reachable from http://www.sabanciuniv.edu/fens/people/oflazer/

I have tried this one out - it seems to do quite a lot, it is
especially interesting since it treats both word formation and
inflection.


 ---

Multilingual

Word-Manager (German, English, Italian, ...)

---

More general information about morphology systems (dealing mostly with
inflection) can be found

http://www.sil.org/computing/comp-morph-phon.html
http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/morph.en.html




--
Dr. Anke Lüdeling
Institut für Kognitionswissenschaft, Universität Osnabrück
Katharinenstr. 24, 49069 Osnabrück, Germany
phone: +49-541-9694073
fax: +49-541-9696210
homepage: http://www.cogsci.uni-osnabrueck.de/~aluedeli



More information about the Corpora mailing list