[Corpora-List] Hi

Francis Tyers ftyers at prompsit.com
Wed Nov 18 08:53:44 UTC 2009


El dt 17 de 11 de 2009 a les 17:23 +0100, en/na Harald Hammarström va
escriure:
> Dear Rye Abdi,
> Maybte the following papers are relevant to your q. all the best, H
> 
> Abdillahi, Nimaan, Pascal Nocera & Juan-Manuel Torres-Moreno. 2006. Boites
> à outils TAL pour les langues peu informatisées: le cas du Somali. In 
> Journées
> d.Analyses des Données Textuelles (JADT 06), 697-705. Besançon-France
> 
> 
> Hurskainen, A. 1992. A Two-Level Computer Formalism for the Analysis of
> Bantu Morphology: An Application to Swahili. Nordic Journal of African
> Studies 1(1). 87.119.
> 
> Pauw, G. De & G.-M. de Schryver. 2008. Improving the Computational 
> Morphological
> Analysis of a Swahili Corpus for Lexicographic Purposes. Lexikos
> 18. 303.318.
> 
> Pauw, G. De, G-M. de Schryver & P.W. Wagacha. 2006. Data-driven 
> part-ofspeech
> tagging of Kiswahili. In Proceedings of Text, Speech and Dialogue, 9th
> International Conference (LNAI 4188), 197-204. Berlin: Springer-Verlag

I would add to this that it is not now necessary to use the proprietary
Xerox toolkit for finite-state morphology. There are two excellent free
software (GPL) projects which implement the formalism:

* HFST -- for lexc and twol 
  http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/

* Foma -- for lexc and xfst
  http://foma.sourceforge.net/

Both have been tested on a wide variety of lexicons "in the wild" and
the authors are actively maintaining the software and keen to hear
comments and suggestions.

By all means buy the FSM book (it really is fantastic), but just to let
you know that there are free alternatives now.

Regards,

Fran


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list