[Corpora-List] Product Announcement: Canoo Morphology Software becomes Multilingual

Sandra Wendland sandra.wendland at canoo.com
Thu Apr 10 14:18:17 UTC 2003


Product Announcement

Morphology Software becomes Multilingual
WMTrans Products Now Available for English

Canoo extends its product range to cover further languages: in addition to German, the Basel-based company now offers its morphology
software, WMTrans, for English. Products for Italian are in preparation.

An English version of the
	WMTrans Lemmatizer
	WMTrans Inflection Analyzer
	WMTrans Inflection Analyzer/Generator
is now available at the product site:
http://www.canoo.com/wmtrans

Like their German counterparts, these Word Manager Transducers (WMTrans) provide functionality to analyze and generate inflected
English word forms.

The smart text processing software is used in retrieval and language processing applications. Typical use cases include word
stemming, intelligent search, text indexing, text mining, language learning, hyperlink generation, spell checking, grammar checking,
and machine translation.

The WMTrans Lemmatizer determines the base form of a word and its category. Consider the word form "went":
The Lemmatizer analyses the word, retrieves its base form, "go", and determines the word category, "V (verb)".

	went -> go (Cat V)

The WMTrans Inflection Analyzer processes any word form, and delivers as a result a rich set of useful information on inflection.

For example, for the word plants, the Inflection Analyzer returns two analyses: the base form "plant" and the two possible word
categories "N (noun)" and "V (verb)", plus further morphosyntactic details such as number or verb tense:

	plants ->
	plant (Cat N)(Num PL)
	plant (Cat V)(Tense Present)(VForm s)


WMTrans Inflection Analyzer/Generator offers two function calls. The Analyzer returns morphosyntactic information for an input word
such as its base form, word category, gender, case, tense, auxiliary verbs. The Generator delivers a list of all word forms related
to a base form. The word forms are followed by a list of morphosyntactic features related to each single form. See the Inflection
Analyzer/Generator output for the base form "plant":

	Analyzer:
	plant
	(Cat N)(Num SG)
	(Cat V)(Tense Present)(VForm Base)

	Generator:
	plant
	(Cat N)(Num SG)
	(Cat V)(Tense Present)(VForm Base)
	planted
	(Cat V)(Tense Past)
	(Cat V)(VForm Past_Participle)
	planting
	(Cat V)(VForm ing_Participle)
	plants
	(Cat N)(Num PL)
	(Cat V)(Tense Present)(VForm s)

The WMTrans products for English are available in Java and run on any platform with Java Runtime Environment (JRE) 1.3 or higher.

WMTrans is based on the Canoo morphological dictionaries, containing
-	more than 250'000 lexemes, generating 3 million fully categorized word forms for German,
-	and 50'000 lexemes, generating 115'000 word forms for English.

The dictionaries include information on word formation dependencies, all types of morphological irregularities and spelling
variants, e.g. differences between American and British English spelling. In addition, contractions such as "can't", "he'll",
"mother's" and "parents'" are analyzed successfully, bringing the number of recognized English word forms to 220'000.

WMTrans allows developers to integrate functions such as word stemming, spell checking, and paradigm generation in their
applications.

For more information, see:
http://www.canoo.com/wmtrans/

or contact:
Elisabeth Maier
Canoo Engineering AG
Kirschgartenstr. 7
CH-4051 Basel
Tel.: +41 61 228 94 44
mailto:wmtrans-info at canoo.com


---

Sandra Wendland
Canoo Engineering AG
Kirschgartenstrasse 7
CH-4051 Basel
Tel. +41 61 228 94 66
Fax +41 61 228 94 49
mailto:sandra.wendland at canoo.com
Web: http://www.canoo.com



More information about the Corpora mailing list