[Corpora-List] Finite-state parsing

Michael Maxwell maxwell at umiacs.umd.edu
Tue Aug 7 15:54:17 UTC 2007


Hrafn Loftsson wrote:
> Can anyone point to papers about finite-state parsing methods for
> morphologically complex languages, e.g. Slavic languages?

There's this:

   Sproat, Richard William. 1992. Morphology and computation.
   Cambridge, Mass.: MIT Press.

A bit dated now, but still worth looking at.

More recently, there's this:

   Beesley, Kenneth R., and Karttunen, Lauri. 2003.
   Finite State Morphology: CSLI Studies in
   Computational Linguistics. Chicago: University
   of Chicago Press.

This contains a CD with executable versions of the Xerox finite state
tools.  The CD is outdated (the version of the tools on it don't do
Unicode, and there have been bugfixes), but the authors provide updates. 
A new version of the book is due out Real Soon Now.

While I don't think either of these books talks about Slavic languages,
they do discuss other morphologically complex languages.  In particular,
the Beesley and Karttunen book provides a number of worked examples,
including some in Arabic and Tagalog.  And I can testify that it works
well for morphological complexity--I've done transducers for Cebuano and
Nahuatl, both of which I would consider to be "interesting" from a
morphology standpoint.

There are a number of other finite state transducers "out there", but
perhaps less well documented than the Xerox one.  BTW, the Xerox tools
include a transducer that works with two-level rules, but it also includes
the capability of writing linearly ordered phonological rules.  For a
linguist like me, that is *much* easier than trying to learn and work with
Two Level Phonology.  (Your mileage may vary...)

There have also been conferences on this topic, including an EACL one in
Hungary in 2003, and aperiodic ones since then.

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list