Información sobre corpus en español

Carlos Subirats Rüggeberg Carlos.Subirats at uab.es
Mon Sep 28 12:19:09 UTC 1998


INFOLING  Lista moderada de lingüística española
http://listserv.rediris.es/archives/infoling.html
Envío de información: INFOLING at listserv.rediris.es
Editor: Carlos Subirats Rüggeberg <Carlos.Subirats at uab.es>
Colaboradoras:
Paola Bentivoglio <pbentivo at reacciun.ve>, UCV
Eulalia de Bobes <ebobes at seneca.uab.es>, UAB
Mar Cruz <mcruz at lingua.fil.ub.es>, UB
Emma Martinell <martinell at lingua.fil.ub.es>, UB
___________________________________________________________

            Información sobre corpus en español
      De: Eva Remberger <eremberg at spinfo.uni-koeln.de>
       Información distribuida por: The Linguist List
                  http://linguistlist.org/
Thanks to Andreas Eisele, Antoine Consigny, Valerie
Mapelli, Iain Downs, Purificacion Fdez-Nistal, Susana
Sotelo Docio, Eva Easton, Leonel Ruiz Miyares, Jos Luis
Sancho, M.M.W.Pollmann, Raphael Salkie, René Schneider
___________________________________________________________

                     Summary of results

    - Among the commercial corpora there is ELRA:
      http://www.icp.grenet.fr/ELRA/cata/tabtext.html

    They have an Multilingual corpus (MLCC) consisting of
6 European financial newspapers (Het Financieele Dagblad,
Handelsblatt, Financial Times, Le Monde, Il Sole 24 Ore,
Expansion); the spanish subcorpus (Expansión) has about 10
million words (21.10.1991-24.10.91 and 14.5.94-27.12.94).
The entire corpus is available via ELRA at the following
costs:
    For ELRA members for research use: 360 ECU
    For non members for research use: 750 ECU


    - Another commercial publisher of research material
and a provider of newspapers on CD-ROM is Newsbanks: They
offer Noticias en Español on monthly CD-ROMs:
     http://www.newsbank.com/schools/high/spanish.html


    - Yet another commercial service is ProQuest; they
seem to have EL Norte and Reforma (México):
            http://www.umi.com/hp/WhatWeDo.html


    - There is a CD-ROM edition of the 1994, 1996 and 1996
volumes of El Mundo (two CD-ROMS for each year); the text
is in ASCII format and classified in categories (economy,
national, etc) and it's available.


    - There is a link collection to Spanish online-
newspapers at:
            http://www.newslink.org/euspan.html


    - There is a website about corpora-FAQs of the
Language technology group (the interesting one is the tool
section I guess):
 http://www.ltg.ed.ac.uk/helpdesk/faq/index.html#Texts0040


    - El Observatorio Español de Industrias de la Lengua,
could be interesting; it also has some more links:
  http://www.cervantes.es/internet/acad/oeil/mar_oeil.htm
Click on Recursos lingüísticos.


    - There are several corpora available at the
Department of Romance Languages of the University of
Goeteborg (Banco de datos de Prensa Española 1977, Banco
de Datos de Once Novelas Españolas 1951-1971, A
Concordance based on the Corpus Oral de Referencia del
Español Contemporáneo):
            http://rom.gu.se/~romgb/Corpora.html


    - Professor Barry Ife, at School of Humanities, King's
College (London) is referred to be compiling a large
corpus of modern Spanish:
                e-mail: barry.ife at kcl.ac.uk


    - Spanish newspaper corpus that consists of 200
newspaper texts of Latinamerican newspapers on CD-ROM
(Tiff and a ASCII Version). The corpus includes 39.081
tokens and is available at:
Information Science Research Institute
University of Nevada at Las Vegas
4505 Maryland Parkway
Las Vegas, Nevada 89154-4201

For information contact ISRI by Phone: +1 702 895 - 3338
Fax: +1 702 895 -1560, E-mail: isri-info at isri.unlv.edu


    - At the University of Murcia there is the CUMBRE
Corpus: Contact Prof. Aquilino Sanchez: asanchez at fcu.um.es


    - The CRATER corpus consists of morphosyntactically
tagged communication: ftp.ling.lancs.ac.uk


    - Dr. Purificación Fdez.- Nistal and the Instituto de
Terminología Bilingüe y Traducción Especializada (ITBYTE)
at the Universidad de Valladolid (Spain) are in the
process of building their own corpus.


   - Ing. Leonel Ruiz Miyares (Director of Applied
Linguistics Centre (Santiago de Cuba, Cuba) keeps a
Spanish-corpus of children's vocabulary (by the way, there
is a European Spanish Corpus of child language, the
MARIA-Corpus:
               http://www.sis.ucm.es/Spanish/


    - The Lingua project (EU-funded project on
multilingual concordancing: but as far they have only
English, French, German, Italian, Greek, Danish texts -
they are considering bringing in Spanish and
portuguese:
        http://www.loria.fr/equipes/dialogue/lingua

----------------------------------------------------
Formatos para enviar informacion a INFOLING.
Enviar a LISTSERV at LISTSERV.REDIRIS.ES
la orden:	INFO INFOLING
----------------------------------------------------




More information about the Infoling mailing list