7.960, Sum: Corpus analysis resources for Spanish

The Linguist List linguist at tam2000.tamu.edu
Mon Jul 1 19:03:52 UTC 1996


---------------------------------------------------------------------------
LINGUIST List:  Vol-7-960. Mon Jul 1 1996. ISSN: 1068-4875. Lines:  139
 
Subject: 7.960, Sum: Corpus analysis resources for Spanish
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu> (On Leave)
            T. Daniel Seely: Eastern Michigan U. <dseely at emunix.emich.edu>
 
Associate Editor:  Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
                   Ann Dizdar <dizdar at tam2000.tamu.edu>
                   Annemarie Valdez <avaldez at emunix.emich.edu>
 
Software development: John H. Remmers <remmers at emunix.emich.edu>
 
Editor for this issue: dizdar at tam2000.tamu.edu (Ann Dizdar)
 
---------------------------------Directory-----------------------------------
1)
Date:  Mon, 01 Jul 1996 14:00:06 +0200
From:  sancho at crea.rae.es ("J.L. Sancho, INSTITUTO DE LEXICOGRAFIA")
Subject:  Corpus analysis resources for Spanish
 
---------------------------------Messages------------------------------------
1)
Date:  Mon, 01 Jul 1996 14:00:06 +0200
From:  sancho at crea.rae.es ("J.L. Sancho, INSTITUTO DE LEXICOGRAFIA")
Subject:  Corpus analysis resources for Spanish
 
Dear all:
 
	A while back my colleague Maria Paula Santalla and I (Jose
Luis Sancho) posted an enquiry about corpus analysis resources for
Spanish.  The following is a summary of what we have been referred
to. We would like to thank for their kind responses (order
irrelevant): Max Louwerse, Mike Scott, Carlos Subirats, Ken Litkowski,
Jean V'eronis, Yorick Wilks, Sandro Pedrazzini, John Aberdeen, Ana
Mart'inez, Nuno Miguel Cavalheiro Marques and Ken Beesley. This list
exhausts our 'inbox'; therefore, we beg anyone else who responded and
is not mentioned above to forgive us (or our server); In that case,
retry, please. Note that the enquiry was posted in various lists,
hence information not necessarily coming from this list may be quoted
bellow. We apologize for any multiplicities.
 
 
##Max Louwerse (<M.M.Louwerse at stud.let.ruu.nl>) told us about the
Qualrs-lst on which a lot of tag-software has been discussed. As for
software, he mentioned NUDIST (Sage Publishers) and Notabene, whose
homepage is
 
	http://sls-www.lcs.mit.edu/~flammia/Nb.html and
	ftp://sls-www.lcs.mit.edu/pub/flammia/Nb."
 
You can also email to Giovanni Flammia (flammia at mit.edu).
 
##Mike Scott (<ms2928 at ac.uk>) suggested
 
	http://www.liv.ac.uk/~ms2928/wordsmit.html
 
This accesses WordSmith Tools (Oxford Univ. Press 1996).
 
##Carlos Subirats (<lali1 at uab.es>) pointed to a 'Etiquetador y
desambiguizador del espanol', developed by the Laboratorio de
Linguistica Informatica de la Universidad Autonoma de Barcelona. The
address provided is
 
	Carlos Subirats Ruggeberg
	Universidad Autonoma de Barcelona
	Laboratorio de Linguistica Informatica
	Edificio B
	08193 Bellaterra, Spain
 
	e-mail: c.subirats at oasis.uab.es
	e-mail: c.subirats at cc.uab.es
	Fax: (343)-581-16-86
        Tel: (343)-581-22-29
 
 
##Ken Litkowski <71520.307 at CompuServe.COM> directed us to some
dictionary utilities for creating and maintaining lexica.  A
description of this software is available at
 
	http://www.clres.com
 
##Jean V'eronis (<veronis at univ-aix.fr>) suggested a look at
 
   http://www.lpl.univ-aix.fr/projects/multext/
 
and contacting Nuria Bel (nuria at gilcub.es).
 
##Yorick Wilks (<yorick at dcs.shef.ac.uk>) pointed to david at crl.nmsu.edu
 
##Sandro Pedraziini (<sandro at idsia.ch>) pointed to a system with wich
you can not only create and maintain lexica, but you can use it to
generate different forms of taggers, lemmatizers. A description of it
can be found at
 
	http://www.ifi.unibas.ch/grudo/grudo.html
	http://www.idsia.ch/wordmanager.html
 
##John Aberdeen (<aberdeen at mitre.org>) mentioned a fast part of speech
tagger, based on Eric Brill's notion of tranformation based error
driven learning.
 
##Ana Mart'inez (<sysnet at bitmailer.net>) mentioned MABLe, a
'multilingual letter authoring tool'.
 
##Nuno Miguel Cavalheiro Marques (<nmm at di.fct.unl.pt>) brought to our
attention two POS taggers, one using Viterbi tagging and HMM and the
other using Neural Networks. You can find a short review of this work
at
 
	http://www-ia.di.fct.unl.pt/~nmm
	http://www-ia.di.fct.unl.pt/~glint/Glint
 
There you can also access an article about POLARIS:a morphological
lexical acquisition and retrieval data base system. Contact with
Gabriel Lopes (gpl at fct.unl.pt) was also suggested.
 
##Ken Beesley (<Ken.Beesley at Grenoble.RXRC.Xerox.com>) noted that the
Rank Xerox Research Centre in Grenoble France has developed systems
for tokenization (word/term division) morphological analysis (for
syntax, or, less detailed, for tagging) part-of-speech "guesser" (for
words not found by the morphological analysis) tagging (based on an
HMM tagger, trained on a corpus) for Spanish. You can experiment with
the morphological analysis and tagger on
 
	http://www.xerox.fr/grenoble/mltt/home.html
 
 
Thank you very much again. See you on the net
 
 
 
Jose Luis Sancho                        Maria Paula Santalla
sancho at crea.rae.es                      santalla at crea.rae.es
------------------------------------------------------------------------
LINGUIST List: Vol-7-960.



More information about the LINGUIST mailing list