7.432, FYI: Linguistic Data Consortium, V Simposio Internacional
The Linguist List
linguist at tam2000.tamu.edu
Fri Mar 22 17:27:44 UTC 1996
---------------------------------------------------------------------------
LINGUIST List: Vol-7-432. Fri Mar 22 1996. ISSN: 1068-4875. Lines: 120
Subject: 7.432, FYI: Linguistic Data Consortium, V Simposio Internacional
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu> (On Leave)
T. Daniel Seely: Eastern Michigan U. <dseely at emunix.emich.edu>
Associate Editor: Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
Ann Dizdar <dizdar at tam2000.tamu.edu>
Annemarie Valdez <avaldez at emunix.emich.edu>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Editor for this issue: lveselin at emunix.emich.edu (Ljuba Veselinova)
---------------------------------Directory-----------------------------------
1)
Date: Tue, 19 Mar 1996 16:00:37 EST
From: ldc at unagi1k.cis.upenn.edu (LDC Office)
Subject: New Release from the LDC
2)
Date: Fri, 22 Mar 1996 12:14:22 +0100
From: cioni at ux1sns.sns.it (cioni)
Subject: V Simposio Internacional
---------------------------------Messages------------------------------------
1)
Date: Tue, 19 Mar 1996 16:00:37 EST
From: ldc at unagi1k.cis.upenn.edu (LDC Office)
Subject: New Release from the LDC
Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM
SPANISH NEWS TEXT COLLECTION
The Spanish News Corpus consists of journalistic text data from one
newspaper (El Norte, Mexico) and from the Spanish-language services of
three newswire sources: Agence France Presse, Associated Press
Worldstream, and Reuters. (The Reuters collection comprises two
distinct services: Reuters Spanish Language News Service and Reuters
Latin American Business Report.)
All text data are stored on one CD-ROM, in a standard compressed form.
The fours sets of newswire data (AFP, APWS, and two Reuters services)
are each organized as one data file per day of collection. The period
covered by these collections runs from December 1993 (for APWS and
Reuters) or May 1994 (APWS) through December 1995. (The El Norte
data, provided to us by INFOSEL Mexico, are arbitrarily grouped into
files of about 1 megabyte in size when uncompressed; date information
is not available for individual articles, but the general period of
the collection is 1993.)
The approximate amounts of data per source (when uncompressed) is
indicated below (in total megabytes and millions of words of text):
Source MB MW
-------------------
AFP 345 44
APWS 253 33
REUSL 333 41
REULA 233 23
INFOSEL 209 31
The presentation of text data in these collections is modeled on the
TIPSTER corpus. Within each data file, SGML tagging is used (1) to
mark article boundaries, (2) to delimit the text portion within each
article, and (3) to label various pieces of information about the
article that are external to the text content (e.g. headlines,
bylines, and so on).
The copyright holders of this text have requested that it be made
available to LDC members only. Due to the release date this corpus is
available to 1995 and 1996 members. In order to obtain this corpus,
current LDC members must submit a signed User Agreement Form.
Inquiries about the corpus or requests for it, or information about
becoming members should be directed to ldc at unagi.cis.upenn.edu.
Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.cis.upenn.edu/~ldc. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.
------------------------------------------------------------------------
2)
Date: Fri, 22 Mar 1996 12:14:22 +0100
From: cioni at ux1sns.sns.it (cioni)
Subject: V Simposio Internacional
Dear colleagues, I am very pleased to announce You that the Centro de
Linguistica Aplicada at Santiago of Cuba has opened a very interesting and
worth-visiting home page at the address
http://web.cict.fr:8200/gril/Cuba.html
and that all of You who are interested in the V Simposio (to be held in
Santiago on January 1997) can find any information they may need at the
following address
http://web.cict.fr:8200/gril/Sympo.html
For further information, please contact lingapli at ceniai.cu (notice indeed
that I am not involved in the organization).
Best regards, Lorenzo Cioni
------------------------------------------------------------------------
LINGUIST List: Vol-7-432.
More information about the LINGUIST
mailing list