Corpora: word-formation systems
Anke Lüdeling
aluedeli at uos.de
Wed May 8 12:52:55 UTC 2002
Dear list members,
last week I asked for information about computational morphology
systems that deal with word-formation. I received a number of helpful
replies - thank you very much.
I would like to thank the following colleagues
Antti Aarppe
Janne Bondi Johannessen
Rodolfo Delmonte
Sergei A. Koval
Kemal Oflazer
Dan Tufis
Alexander S. Yeh
Below I have summarized the responses I got plus the information on
word-formation systems that I already had by language. I have given
the url where available and commented where I could (I haven't yet had
the time to look at all the links and papers that were provided but I
will
certainly try to do so).
---
English
ALE-RA
http://nl.ijs.si/et/Thesis/ALE-RA/
"Alexander S. Yeh" wrote:
> UMLS (Unified Medical Language System?) is a U.S. Government program that
> provides among other things, a free morphological variation system for
> mainly English medical terms.
----
Finnish (& other languages)
Antti Arppe wrote:
> A Finnish language technology company, Lingsoft <www.lingsoft.fi> has
> used their morphological models (based on the two-level principle and
> model by Koskenniemi) for generating inflected word forms in
> inflecting thesauri, i.e. synonym dictionaries that can handle the
> inflected forms of the synonyms as well. The languages that were
> covered are Finnish, Swedish, Norwegian (bokmål), Danish and German.
>
> There's a short presentation on part of this in the Proceedings of the
> 17th Scandinavian Conference of Linguistics: Arppe, Antti; Voipio,
> Mari; Würtz, Malene 2000. Creating Inflecting Electronic Thesauri. In
> Lindberg, Carl-Erik & Nordahl Lund, Steffen 2000. 17th Scandinavian
> Conference of Linguistics Odense Working Papers in Language and
> Communication, No. 19, Vol. I, Institute of Language and
> Communication, University of Southern Denmark.
>
> In the case of these software tools, the generation was geared for the
> (limited) synonym content. In principle the same models could be
> applied for the language as a whole, but there are a variety of
> factors that have to be considered in such a case, e.g. variant
> inflected forms and errors in the underlying linguistic model which
> become apparent only when generation is applied.
>
> Though I have been talking here mostly about inflection, specifically
> the Finnish model has had a version where both derivations and
> inflections can be generated from root words, e.g.
>
> ympäri+dv-oida+dn-minen+nom+sg > ympäröiminen
> around+verbalize+nominalize+nominative+singular > encirclement
>
> I believe that this could be adapted rather easily to the other
> languages as well, since they're all based on the same theoretical
> principle, i.e. the TWOL model which allows to be used for both
> morphological analysis and generation. Nevertheless, Lingsoft has not
> been otherwise very active regarding these tools, as far as I know.
Comment: I am familiar with GerTWOL, the German version of TWOL. A
link is given below.
----
German
DeKo (for Derivation und Komposition, IMS, University of Stuttgart;
this is the project I worked in :-)
http://www.ims.uni-stuttgart.de/projekte/DeKo
Projekt Deutscher Wortschatz (University of Leipzig):
http://wortschatz.uni-leipzig.de
Deutsche Malaga Morphologie (university of Erlangen):
http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.html
CISLEX (University of Munich):
http://www.cis.uni-,uenchen.de/projects/CISLEX:html
GerTWOL (Lingsoft Inc.): http://www.lingsoft.fi/cgi-bin/gertwol
and there is a German version of WordManager (University of Basel &
Canoo)
http://www.wordmanager.com
---
Italian
Rodolfo Delmonte wrote:
>
> As to the morphology word formation system, of course we have our
> system for Italian IMMORTALE) that generates/analyses derivations
> besides inflections. But no compound word, at least not yet. Even
> though we could regard cliticized verbs as a special type of compound
> word,
> - lasciamoglielo / (let's) leave it to him
> it requires clitic stripping and then inflection stripping, perhaps
> with derivation stripping too, in case the verb is not included in
> the dictionary list.
> There's a number of published papers on it, they are listed in my website.
> website: http://project.cgm.unive.it
---
Norwegian
Janne Bondi Johannessen wrote:
> For Norwegian, we have a compound analyser that also analyses
> productive derivation as part of our morphological tagger. It can be
> tested at : http://decentius.hit.uib.no:8005/cl/cgp/test.html
---
Romanian
Dan Tufis wrote:
>
> For Romanian I can give you at least three examples:
> 1) Dan Cristea's morphological analyser/generator in the early 1980's
> 2) my PARADIGM morphology learning system
> (described in the EACL89 proceedings: "Tufis D. "It Would Be Much Easier If
> WENT Were GOED",
> in Harry Somers, Mary McGee Wood (eds.), Proceedings of the 4th EACL,
> Manchester, 1989, pp.145-152
> and in EACL91: Tufis D., Popescu O., "A Unified Management and Processing of
> Word-Forms, Idioms and Analytical Compounds", in Jurgen Kunze and Dorothy
> Reinman (eds.), Proceedings of the 5th EACL, Berlin, 1991, pp.95-100)
> 2) Dan Cristea's MICH classification-based system
> (described in Dan Cristea (1994): The Classification Language MICH, Research
> Report, LIMSI-CNRS, Universite Paris-Sud, Orsay.
> Dan Cristea (1993): The generation of Romanian Morphology. Research Report.
> University of Edinburgh).
>
> There is a new C-based PC-implementation of the LISP system 1) due to Stefan
> Andrei of University A.I. Cuza in Iasi
> (described in Andrei, St.: A Morphological Analyser for Romanian Language.
> The First EUROLAN Summer School
> in Natural Language Processing , Iasi - Romania, July 19-29, 1993)
---
Russian
"Sergei A. Koval" wrote:
>
> As for Russian, there is a system called RUSLO (abbreviated from the Russian
> "RUSskoye SLOvoobrazovaniye" = "Russian Derivation") developed by
> N.N.Pertsova, A.V.Cheremkhin, A.V.Rafaeva.
> Some details are available at
> http://194.226.57.46/uvk1838/Sciper/volume1/pertsova.htm
---
Turkish
Kemal Oflazer wrote:
>
> You may want to take a look at the morphological analyzer for Turkish
> reachable from http://www.sabanciuniv.edu/fens/people/oflazer/
I have tried this one out - it seems to do quite a lot, it is
especially interesting since it treats both word formation and
inflection.
---
Multilingual
Word-Manager (German, English, Italian, ...)
---
More general information about morphology systems (dealing mostly with
inflection) can be found
http://www.sil.org/computing/comp-morph-phon.html
http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/morph.en.html
--
Dr. Anke Lüdeling
Institut für Kognitionswissenschaft, Universität Osnabrück
Katharinenstr. 24, 49069 Osnabrück, Germany
phone: +49-541-9694073
fax: +49-541-9696210
homepage: http://www.cogsci.uni-osnabrueck.de/~aluedeli
More information about the Corpora
mailing list