Nuevo Corpus: "1997 Spanish Broadcast News"

Carlos Subirats Rüggeberg Carlos.Subirats at uab.es
Wed Jul 15 17:46:58 UTC 1998


INFOLING  Lista moderada de lingüística española
http://listserv.rediris.es/archives/infoling.html
Envío de información: INFOLING at listserv.rediris.es
Editor: Carlos Subirats Rüggeberg <Carlos.Subirats at uab.es>
Colaboradoras:
Paola Bentivoglio <pbentivo at reacciun.ve>, UCV
Eulalia de Bobes <ebobes at seneca.uab.es>, UAB
Mar Cruz <mcruz at lingua.fil.ub.es>, UB
Emma Martinell <martinell at lingua.fil.ub.es>, UB
____________________________________________________________

         Nuevo Corpus: "1997 Spanish Broadcast News"
               Información proporcionada por:
       Observatorio Español de Industrias de la Lengua
                     <oeil at cervantes.es>
           http://www.cervantes.es/oeil/Oeil0.htm
____________________________________________________________

       NEW RELEASE from the Linguistic Data Consortium
            1997 Spanish Broadcast News (HUB-4NE)

    This corpus contains a portion of the acoustic data
designated as the training set for the 1997 DARPA HUB-4
Spanish Benchmark. It contains speech and transcripts of
30 hours of broadcast news from the following sources:

    VOA
    Univision
    Televisa

All acoustic files are in NIST SPHERE format, without
compression. The sample data are 16-bit linear PCM, 16-KHz
sample frequency, single channel. Most files contain 30
minutes of recorded material, and some contain 60 or 120
minutes (approximately); the sampling format requires
roughly 2 megabytes (MB) per minute of recording, so the
file sizes are typically around 60 MB, with some files
ranging up to 120 or 240 MB.

    The transcripts are in SGML format, using the same
markup conventions that have been applied to the other 1997
Broadcast News speech corpora (in English and Mandarin),
and are transmitted by ftp, not on the cdroms with speech
data.

    Because of restrictions imposed by the copyright
holders, this corpus is available to 1998 LDC members only.

    If you would like to order a copy of this corpus,
please email your request to:

                   ldc at unagi.cis.upenn.edu

If you need additional information before placing your
order, or would like to inquire about membership in the
LDC, please send email or call (215)898-0464.

Further information about the LDC and its available corpora
can be accessed on the Linguistic Data Consortium WWW Home
Page at URL:

                  http://www.ldc.upenn.edu

Information is also available via ftp at ftp.cis.upenn.edu
under pub/ldc; for ftp access, please use "anonymous" as
your login name, and give your email address when asked for
password.

----------------------------------------------------
Formatos para enviar informacion a INFOLING.
Enviar a LISTSERV at LISTSERV.REDIRIS.ES
la orden:	INFO INFOLING
----------------------------------------------------




More information about the Infoling mailing list