9.577, FYI: ELRA Focus,Parser Link,OTA Web Testers

LINGUIST Network linguist at linguistlist.org
Thu Apr 16 19:23:26 UTC 1998


LINGUIST List:  Vol-9-577. Thu Apr 16 1998. ISSN: 1068-4875.

Subject: 9.577, FYI: ELRA Focus,Parser Link,OTA Web Testers

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Editors:  	    Brett Churchill <brett at linguistlist.org>
		    Martin Jacobsen <marty at linguistlist.org>
		    Elaine Halleck <elaine at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Julie Wilson <julie at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 6 Apr 1998 13:49:28 +0200 (MET DST)
From:  info-elra at calva.net (Valerie Mapelli)
Subject:  ELRA Focus - MLCC Multilingual Corpora for Co-operation

2)
Date:  Thu, 16 Apr 1998 00:26:20 -0400
From:  Doug Beeferman <Doug_Beeferman at cuff.link.cs.cmu.edu>
Subject:  Link Grammar Parser: http://www.link.cs.cmu.edu/link/

3)
Date:  Thu, 16 Apr 1998 10:21:41 +0100 (BST)
From:  Oxford Text Archive <archive at sable.ox.ac.uk>
Subject:  Testers wanted for new OTA website

-------------------------------- Message 1 -------------------------------

Date:  Mon, 6 Apr 1998 13:49:28 +0200 (MET DST)
From:  info-elra at calva.net (Valerie Mapelli)
Subject:  ELRA Focus - MLCC Multilingual Corpora for Co-operation


            EUROPEAN LANGUAGE RESOURCES ASSOCIATION
                         ELRA Focus
             =====================================


                MLCC Multilingual Corpora  for Co-operation

A collection of newspaper articles from financial newspapers in 6
languages (Dutch, English, French, German, Italian and Spanish) and a
set of parallel texts in the 9 European Union official languages (as
of 1993)

              =====================================

The current catalogue of ELRA consists of more than 500 language
resources (!)  available for speech, written or terminology
works. This electronic message aims to remind of the availability of
one of them, namely the MLCC Multilingual Corpora for Co-operation.

The MLCC text corpus has two main components - one set to allow
comparable studies to be carried out in different languages and one
set as the basis for translation studies.

The first set is referred as the Polylingual Document Collection
(ELRA-W0006), a collection of newspaper articles from financial
newspapers in 6 languages (Dutch, English, French, German, Italian and
Spanish). It consists of the following sub-corpora:

Dutch - "Het Financieele Dagblad" - 1992-1993 The corpus contains
articles from the Dutch financial newspaper "Het Financieele Dagblad"
editions of 2nd January 1992 through to 24th December 1993. It
contains around 8.5 million words of text.

English - "The Financial Times" - 1993 The corpus contains articles
from the British financial newspaper "The Financial Times" editions
from the year 1993. The corpus contains around 30 million words.

French - "Le Monde" - 1992-1993 A corpus of articles from the French
newspaper "Le Monde", consisting of two years worth (1992-1993) of
articles on financial subjects, approximately 10 million words.

German - "Handelsblatt" - 1986-1988 This subcorpus consists of
articles from the period 02.01.1986 to 15.06.1988.  It contains some
33 million words. It may be possible to obtain more recent articles
from "Handelsblatt".

Italian - "Il Sole 24 Ore" - 1992-1993 The corpus described here
contains articles from the Italian financial newspaper "Il Sole 24
Ore" from the year 1992. This corpus contains some 1.88 million
words. The SGML-markup was done by the University of Edinburgh.

Spanish - "Expansion" - 1994 This subcorpus contains articles from the
Spanish financial newspaper "Expansion" editions from 21.10.1991 to
24.10.1991 and 14.05.1994 to 27.12.1994. It contains some 10 million
words.

    Price for ELRA members:
        for research use: 360 ECU
        for commercial use: 1500 ECU

    Price for non-members:
        for research use: 750 ECU
        for commercial use: 3200 ECU

The second set is a Multilingual Parallel Corpus (ELRA-W0007)
consisting of translated data in nine European languages: Danish,
Dutch, English, French, German, Greek, Italian, Portuguese and
Spanish. The parallel data, provided by the European Commission,
comprises two sub-corpora from the Official Journal of the European
Communities:

Official Journal of the European Commission, C Series: Written
Questions 1993 Records of questions and answers regarding European
Community matters.  The data is regularly published as one section of
the C Series of the Official Journal of the European Community in all
official languages (previously nine). This corpus contains written
questions asked by members of the European Parliament and
corresponding answers from the European Commission in 9 parallel
versions. The total size of the corpus is approximately 10.2 million
words (ca. 1.1 million words per language).

Official Journal of the European Commission, Annex: Debates of the
European Parliament 1992-1994 This parallel corpus is the records of
Parliamentary sitting published as an annex to the Official Journal of
the European Community Debates of the European Parliament. The
Parliamentary Debates are a record of what was said by members of the
meeting as well as written input provided to the meeting. The original
data from which the translations are produced consist of a transcript
of the sittings, each member speaking in the language of his
choice. The final version consists of nine parallel versions of the
material. The texts delivered comprise the Debates of Parliament from
January 1992 to July 1994. This sub-corpus contains some 5 to 8
million words per language.

   Price for ELRA members:
	for research use: 120 ECU
	for commercial use: 480 ECU
   Price for non-members:
        for research use: 200 ECU
        for commercial use: 800 ECU

     ********************************************
       For more information, please contact:
       ELRA/ELDA
       55-57 rue Brillat Savarin
       75013 PARIS
       Tel: +33 1 43 13 33 33
       Fax: +33 1 43 13 33 30
       E-mail: info-elra at calva.net
       http://www.icp.grenet.fr/ELRA/home.html
     ********************************************


-------------------------------- Message 2 -------------------------------

Date:  Thu, 16 Apr 1998 00:26:20 -0400
From:  Doug Beeferman <Doug_Beeferman at cuff.link.cs.cmu.edu>
Subject:  Link Grammar Parser: http://www.link.cs.cmu.edu/link/


We would like to draw your attention to the release of the new version
of the Link Grammar Parser, version 3.0.

The Link Grammar Parser is a syntactic parser of English, based on
link grammar, an original theory of English syntax.  Given a sequence
of words, the system assigns to it a syntactic structure, composed of
a set of arcs or "links" of different kinds, connecting pairs of
words. The parser has a dictionary of about 60000 word-forms; it has
coverage of a wide variety of syntactic constructions, many idioms,
and capitalization and punctuation phenomena.  It is able to make
guesses about the syntactic categories of unknown words based on
context. It is also robust, and can assign structure to sentences even
when it cannot parse them completely.

The system is written in C, and runs under unix and windows.

Since our last version (version 2.0, in Fall 1995), we have made a
number of improvements to the parser. Its speed is greatly enhanced;
its coverage is significantly improved.  We have also incorporated a
"panic mode", which allows the parser to recover some structure on
long sentences in a short amount of time.  We have also developed an
API for the system.  This allows the parser to be easily integrated
into your own applications.

At the Link Parser website (http://www.link.cs.cmu.edu/link/) you can
try the parser out for yourself.  This website also contains more
information and detailed documentation of the parser.  You are welcome
to download the system from the website and use it for personal or
academic purposes.  If you intend to use it for commercial purposes,
please contact us.  Contact information, and information on the Link
Group at Carnegie Mellon, can be found off the Link Group home page at
http://www.link.cs.cmu.edu/


     Davy Temperley        Daniel Sleator        John Lafferty
     dt3 at columbia.edu      sleator at cs.cmu.edu    lafferty at cs.cmu.edu


-------------------------------- Message 3 -------------------------------

Date:  Thu, 16 Apr 1998 10:21:41 +0100 (BST)
From:  Oxford Text Archive <archive at sable.ox.ac.uk>
Subject:  Testers wanted for new OTA website


The Oxford Text Archive is launching a state-of-the-art web service
later in the year, reflecting our new status as a Service Provider for
the UK's national Arts and Humanities Data Service.

Before this web site goes live, we need feedback from all types of
user.  So whether you are new to electronic text or an expert in the
field, we invite you to visit our site and use our feedback form to
tell us what you think.

As always, the OTA's homepage remains

http://ota.ahds.ac.uk/

but throughout this period of testing, users will have the option to
visit either our current site, or our new experimental service.

NB.in order to fully appreciate this service, we recommend that you
use either Netscape Navigator 4 or IE 3 (or better).

Features of the new OTA site include:

- an online catalogue of all our texts, whether online or offline
- a facility to create a corpus of texts
- a download facility for TEI encoded texts that allows you to
  choose from a variety of different formats
- online tools to help you preparing your texts in SGML
- a listing of future events, as well as papers from previous
  workshops and conferences.
- a FAQ, based on the OTA's 22 years of operation.
- a search tool and site map to help you find your way around the site
- an SGML software repository
- "Guides to Good Practice" on the creation and documentation of
  electronic texts (in preparation)


- -----
Oxford Text Archive
http://ota.ahds.ac.uk
info at ota.ahds.ac.uk
+44-1865-273 238
13 Banbury Road, Oxford, OX2 6NN, UK




---------------------------------------------------------------------------
LINGUIST List: Vol-9-577



More information about the LINGUIST mailing list