Corpora: Parsing morphologically rich languages

Martin Wynne martin at clg.bham.ac.uk
Mon Jan 22 11:31:57 UTC 2001


The EAGLES 'Recommendations for the Morphosyntactic Annotation of
Corpora' (available at
http://www.ilc.pi.cnr.it/EAGLES/annotate/annotate.html) provide a
formalism which can deal with values for multiple morphosyntactic
categories in a single tag, and also has facilities for dealing with
underspecification and ambiguity. The tag is a linear string of
characters, where each character represents a value for a particular
morphosyntactic feature. For example (from the document cited above):

- A common noun, feminine, plural, countable, is represented: N122010
- A 3rd person, singular, finite, indicative, past tense, active, main verb,
+non-phrasal, non-reflexive, verb is
  represented: V3011141101200

As far as I know, these recommendations were drawn up for and have been
used with mainly West European languages such as English, French and
Italian, but it seems to me that they could be usefully applied to more
morphologically rich inflectional languages,

Martin

On Fri, Jan 12, 2001 at 03:18:34PM +0100, "Alexander Mikhailian <mikhailian"@altern.org wrote:
> Hello,
>
> I am looking for references to syntactic parsers
> that deal with morphologically rich flexive languages.
>
> In particular, I am interested in :
>
> 1. Approaches to deal with the number of POS tags
> (terminals) that would supposedly be larger
> than for English or French, e.g if one tries
> to build a list of POS tags for a morphologically
> rich language in order to follow approaches
> developed for English, this list may easily grow up
> to thousands of entries which implies that grammars
> using such a huge list of terminals would be quite
> complicated.
>
> 2. Approaches to deal with the free or loosely
> restricted order of words that is often proper to
> morphologically rich languages and which requires
> different parsing techniques than for English,
> where a common shift/reduce parser is often sufficient.
>
> Thanks in advance,
>
> --
> Alexander Mikahilian
>
>
>

--
Martin Wynne			Centre for Corpus Research,
Coordinator, TRACTOR Network	Department of English,
www.tractor.de			Birmingham University
Tel: +44 (0)121 414 2763	Birmingham
Fax: +44 (0)121 414 6053	UK - B15 2TT
email: martin at clg.bham.ac.uk



More information about the Corpora mailing list