9.1330, Sum: Spanish Corpora

LINGUIST Network linguist at linguistlist.org
Fri Sep 25 11:01:54 UTC 1998


LINGUIST List:  Vol-9-1330. Fri Sep 25 1998. ISSN: 1068-4875.

Subject: 9.1330, Sum: Spanish Corpora

Moderators: Anthony Rodrigues Aristar: Wayne State U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Associate Editors:  Martin Jacobsen <marty at linguistlist.org>
                    Brett Churchill <brett at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>

Assistant Editors:  Scott Fults <scott at linguistlist.org>
		    Jody Huellmantel <jody at linguistlist.org>
		    Karen Milligan <karen at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Chris Brown <chris at linguistlist.org>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Fri, 25 Sep 1998 15:03:04 +0200 (MET DST)
From:  Eva Remberger <eremberg at spinfo.uni-koeln.de>
Subject:  Spanish Corpora

-------------------------------- Message 1 -------------------------------

Date:  Fri, 25 Sep 1998 15:03:04 +0200 (MET DST)
From:  Eva Remberger <eremberg at spinfo.uni-koeln.de>
Subject:  Spanish Corpora

Dear list members,

here are the results and a list of people who were so helpful to send
me suggestions and hints concerning my question posted to the linguist
list on friday 18th september.

My question was as follows:
>Dear list members,
>
>it's a while I'm looking for Spanish Corpora of business
>Spanish. Does anybody know if there are Spanish Newspapers on CD-ROM
>(eg. all the issues of one year as it is possible for the german
>newspaper Sueddeutsche Zeitung)? I tried to contact EL PAIS but never
>received an answer.
>
>Actually, I would be interested in any kind of Corpus of contemporary
>Spanish (mainly european), - to buy or not to buy - but 'economia'-
>arguments would be even greater.
>
>Thank you for an answer. Of course, I will post a message with the
>results.

______________________________________________________________________

I want to thank:

Andreas Eisele
Antoine Consigny
Valerie Mapelli
Iain Downs
Purificacion Fdez-Nistal
Susana Sotelo Docio
Eva Easton
Leonel Ruiz Miyares
Jos Luis Sancho
M.M.W.Pollmann
Raphael Salkie
Rene' Schneider
______________________________________________________________________

The summary of the results:
_______________________________________________________________________

Among the commercial corpora there is ELRA
http://www.icp.grenet.fr/ELRA/cata/tabtext.html

they have an Multilingual corpus (MLCC) consisting of 6 European
financial newspapers (Het Financieele Dagblad, Handelsblatt, Financial
Times, Le Monde, Il Sole 24 Ore, Expansion); the spanish subcorpus
(Expansion) has about 10 million words (21.10.1991-24.10.91 and
14.5.94-27.12.94). The entire corpus is available via ELRA at the
following costs:

- For ELRA members for research use: 360 ECU
- For non members for research use: 750 ECU

- --------------------------------------------------------------------
Another commercial publisher of research material and a provider of
newspapers on CD-ROM is Newsbanks: They offer Noticias en Espanol on
monthly CD-ROMs:
http://www.newsbank.com/schools/high/spanish.html
- ---------------------------------------------------------------------
Yet another commercial service is ProQuest; they seem to have EL Norte
and Reforma (Mexico)
http://www.umi.com/hp/WhatWeDo.html
- -----------------------------------------------------------------
There must be a CD-ROM edition of the 1994 volume of El Mundo (in to
disks); the text is in ASCII format and classified in categories
(economy, national, etc); I'm not sure if it is still available.
- ------------------------------------------------------------------
There is a link collection to Spanish online-newspapers at:
http://www.newslink.org/euspan.html
- ----------------------------------------------------------------------
There is a website about corpora-FAQs of the Language technology group
(the interesting one is the tool section I guess):
http://www.ltg.ed.ac.uk/helpdesk/faq/index.html#Texts0040
- ---------------------------------------------------------------------
El Observatorio Espaol de Industrias de la Lengua, could be
interesting; it also has some more links:
http://www.cervantes.es/internet/acad/oeil/mar_oeil.htm (click on
recursos linguisticos)
- --------------------------------------------------------------------
There a several corpora available at the Department of Romance
Languages of the University of Goeteborg (Banco de datos de Prensa
Espanola 1977, Banco de Datos de Once Novelas Espanolas 1951-1971, A
Concordance based on the Corpus oral the referencia del Espanol
contemporaneo.)  http://rom.gu.se/~romgb/Corpora.html
- ---------------------------------------------------------------------
Professor Barry Ife, at School of Humanities, King's College / London
is reffered to be compiling a large corpus of modern Spanish.
barry.ife at kcl.ac.uk
- ------------------------------------------------------------------------
Spanisch newspaper corpus that consists of 200 newspaper texts of
latinamerican newspapers on CD-ROM (Tiff and a ASCII Version). The
corpus includes 39.081 tokens and is available (to buy) at the
Information Science Research Institute / University of Nevada at Las
Vegas
4505 Maryland Parkway
Las Vegas, Nevada 89154-4201
For information contact ISRI by
Phone: +1 702 895 - 3338
Fax: +1 702 895 -1560
E-mail: isri-info at isri.unlv.edu
- -------------------------------------------------------------------
At the University of Murcia there is the CUMBRE Corpus: Contact Prof.
Aquilino Sanchez: asanchez at fcu.um.es
- -------------------------------------------------------------
The CRATER corpus consists of morphosyntactically tagged
communication: ftp.ling.lancs.ac.uk
- ---------------------------------------------------------------
Dr. Purificacion Fdez.- Nistal and the Instituto de Terminologia
Bilingue y Traduccion Especializada (ITBYTE) at the Universidad de
Valladolid/Spain are in the process of building their own corpus.
- ---------------------------------------------------------------
Ing. Leonel Ruiz Miyares (Director of Applied Linguistics Centre /
Santiago de Cuba) keeps a Spanish-corpus of children's vocabulary
(by the way, there is a European Spanish Corpus of child language, the
MARIA-Corpus: http://www.sis.ucm.es/Spanish/)
- -------------------------------------------------------------
The Lingua project (EU-funded project on multilingual concordancing:
 but as far they have only English, French, German, Italian, Greek,
Danish texts - they are considering bringing in Spanish and
portoghese: http://www.loria.fr/equipes/dialogue/lingua
- ------------------------------------------------------

		Thanks a lot 		Eva Remberger


-
_________________________________________________________________
			Sprachliche Informationsverarbeitung
Eva Maria Remberger	Philosophische Fakultaet
			Universitaet zu Koeln
			Albertus-Magnus-Platz
			D-50923 Koeln
- ---------------------------------------------------------------
	Visit our web-site at:  http://www.spinfo.uni-koeln.de
________________________________________________________________


---------------------------------------------------------------------------
LINGUIST List: Vol-9-1330



More information about the LINGUIST mailing list