Corpora: Parsing morphologically rich languages
Martin Wynne
martin at clg.bham.ac.uk
Mon Jan 22 11:31:57 UTC 2001
The EAGLES 'Recommendations for the Morphosyntactic Annotation of
Corpora' (available at
http://www.ilc.pi.cnr.it/EAGLES/annotate/annotate.html) provide a
formalism which can deal with values for multiple morphosyntactic
categories in a single tag, and also has facilities for dealing with
underspecification and ambiguity. The tag is a linear string of
characters, where each character represents a value for a particular
morphosyntactic feature. For example (from the document cited above):
- A common noun, feminine, plural, countable, is represented: N122010
- A 3rd person, singular, finite, indicative, past tense, active, main verb,
+non-phrasal, non-reflexive, verb is
represented: V3011141101200
As far as I know, these recommendations were drawn up for and have been
used with mainly West European languages such as English, French and
Italian, but it seems to me that they could be usefully applied to more
morphologically rich inflectional languages,
Martin
On Fri, Jan 12, 2001 at 03:18:34PM +0100, "Alexander Mikhailian <mikhailian"@altern.org wrote:
> Hello,
>
> I am looking for references to syntactic parsers
> that deal with morphologically rich flexive languages.
>
> In particular, I am interested in :
>
> 1. Approaches to deal with the number of POS tags
> (terminals) that would supposedly be larger
> than for English or French, e.g if one tries
> to build a list of POS tags for a morphologically
> rich language in order to follow approaches
> developed for English, this list may easily grow up
> to thousands of entries which implies that grammars
> using such a huge list of terminals would be quite
> complicated.
>
> 2. Approaches to deal with the free or loosely
> restricted order of words that is often proper to
> morphologically rich languages and which requires
> different parsing techniques than for English,
> where a common shift/reduce parser is often sufficient.
>
> Thanks in advance,
>
> --
> Alexander Mikahilian
>
>
>
--
Martin Wynne Centre for Corpus Research,
Coordinator, TRACTOR Network Department of English,
www.tractor.de Birmingham University
Tel: +44 (0)121 414 2763 Birmingham
Fax: +44 (0)121 414 6053 UK - B15 2TT
email: martin at clg.bham.ac.uk
More information about the Corpora
mailing list