9.498, Calls: Computational Ling, Ling Resources

Tue Mar 31 12:03:45 UTC 1998

LINGUIST List:  Vol-9-498. Tue Mar 31 1998. ISSN: 1068-4875.

Subject: 9.498, Calls: Computational Ling, Ling Resources

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Editors:  	    Brett Churchill <brett at linguistlist.org>
		    Martin Jacobsen <marty at linguistlist.org>
		    Elaine Halleck <elaine at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Julie Wilson <julie at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Anita Huang <anita at linguistlist.org>
 ==========================================================================

Please do not use abbreviations or acronyms for your conference unless
you explain them in your text.  Many people outside your area of
specialization will not recognize them. Also, if you are posting a
second call for the same event, please keep the message short.  Thank
you for your cooperation.

=================================Directory=================================

1)
Date:  Mon, 30 Mar 1998 18:01:15 +0200
From:  "LOT (Christien Bok)" <lot at let.ruu.nl>
Subject:  Storage and Computation in Linguistics

2)
Date:  Tue, 31 Mar 1998 11:28:45 BST
From:  Wim Peters <W.Peters at dcs.shef.ac.uk>
Subject:  Distributing and Accessing Linguistic Resources

-------------------------------- Message 1 -------------------------------

Date:  Mon, 30 Mar 1998 18:01:15 +0200
From:  "LOT (Christien Bok)" <lot at let.ruu.nl>
Subject:  Storage and Computation in Linguistics

SECOND ANNOUNCEMENT/CALL FOR PAPERS

Congress on:

"Storage and Computation in Linguistics"

Utrecht Institute of Linguistics OTS

UTRECHT, The Netherlands, October 19th, 20th and 21st , 1998

**************************************************
Deadline for submission of abstracts: May 15th , 1998
**************************************************

On the occasion of its tenth anniversary, the Utrecht Institute of
Linguistics OTS is organizing a three-day international congress from
October 19th through October 21st 1998. The theme of this congress is
"Storage and Computation in Linguistics". Invited speakers include:
Steve Pinker, evening lecture; Ray Jackendoff (keynote lecture), Frans
Zwarts, The architecture of the language faculty, Harald Clahsen,
Steven Gillis, language acquisition; John Ohala, Geert Booij, language
change; Sally Thomason, Pieter Muysken, language variation; Nicholas
Asher, Frans van Eemeren, discourse analysis; Ed Keenan & Ed Stabler,
Jan Koster, grammar design.

Two distinct cognitive resources that people may employ in interpreting and
producing linguistic utterances are, on the one hand, memory, and, on the
other, computational procedures. An utterance may be assigned a certain
structure and interpretation because it is recognized as an instance of a
pattern that is stored in memory, or because computational procedures build
up a complex representation of that pattern. In linguistics, this contrast
is usually identified with the contrast between lexicon and grammar. In the
context of this congress, the distinction is broadly conceived as a tool for
exploring our understanding of language structure and language use.

The relation between storage and computation will be analysed on the basis
of a broad range of empirical questions, concerning issues in the
representation and acquisition of linguistic knowledge, the foundations of
language and information, and the cognitive and computational aspects of
language use and processing. Implications of the distinction between storage
and computation will be discussed for six different domains of linguistic
inquiry:
- the architecture of the language faculty
- language acquisition
- language change
- language variation
- discourse analysis
- grammar design

The format of the congress is as follows:
(1)	A number of well-known linguistic scholars of different persuasions and
from different subdisciplines are asked to contribute invited papers
relating to the congress theme. There will be two invited speakers for each
of the six domains of linguistic inquiry mentioned above.
(2)	There will be around twenty slots for presentations of selected papers.
Each selected paper will be allotted 25 minutes, including discussion. There
will only be plenary sessions.

- ------------------------------------------------------------
               LOT

Landelijke Onderzoekschool Taalwetenschap
Netherlands Graduate School of Linguistics

             Trans 10
             3512 JK Utrecht
             Phone: +31 30 2536006
             Fax: +31 30 2536000

-------------------------------- Message 2 -------------------------------

Date:  Tue, 31 Mar 1998 11:28:45 BST
From:  Wim Peters <W.Peters at dcs.shef.ac.uk>
Subject:  Distributing and Accessing Linguistic Resources

		      **********************
		      Call for participation
	  ***********************************************
	  Distributing and Accessing Linguistic Resources
	  ***********************************************

May 27th,

This workshop is part of First International Conference on Language Resources
and Evaluation at the University of Granada, May 26th to 30th 1998 (see
	  http://ceres.ugr.es/~rubio/elra.html
for details and how to register).

The workshop will discuss ways to increase the efficacy of linguistic
resource distribution and programmatic access, and work towards the
definition of a new method for these tasks based on distributed processing
and object-oriented modelling with deployment on the WWW.

Organizers: Yorick Wilks, Wim Peters, Hamish Cunningham, Remi Zajac

Provisional Programme
- -------------------

Panel discussion:

Distributing and Accessing Linguistic Resources
Khalid Choukri, Eduard Hovy, Judith Klavans, Yorick Wilks, Antonio Zampolli

Full papers:

Common Formats of MT User Dictionaries and Environments for
Exchanging Them as a Part of AAMT Activities
S. Kamei, E. Itoh, M. Fujii, T. Hirai, Y. Saitoh, M. Takahashi, T. Hiyama,
K. Muraki
NEC/Toshiba/Sharp/Fujitsu/Kyushu Matsushita, Japan

Distributed Thesaurus Storage and Access in a Cultural Domain Application
S. Boutsis, B. Georgantopoulos, S. Piperidis
Institute for Language and Speech Processing, Athens

Linguistic Research Utilizing the EDR Electronic Dictionary as a
Linguistic Resource
T. Ogino
EDR, Japan

Corpus-based Research using the Internet
D. Broeder, H. Brugman, A. Russel, P. Wittenburg, R. Piepenbrock
Max Planck Institute for Psycholinguistics/CELEX Centre for
Lexical Expertise, Nijmegen

An Architecture for Distributed NLP Objects
R. Zajac
New Mexico State University

A New Model for Language Resource Access and Distribution
W. Peters, H. Cunningham, Y. Wilks, C. McCauley
University of Sheffield

Posters:

TRACTOR: TELRI Research Archive of Computational Tools and Resources
R. Krishnamurthy
University of Birmingham

The CUE Corpus Access Tool
O. Mason
University of Birmingham

Web-Surfing the Lexicon
D. Cabrero, M. Vilares, L. Docampo, S. Sotelo
Ramon Pineiro Research Centre /Universities of Coruna and Santiago

Exploring Distributed MT
O. Streiter, A. Schmidt-Wigger, U. Reuther, C. Pease
IAI Saarbruecken

A Proposal for an On-line Lexical Database
P. Cassidy
Micra, Inc.

Workshop Scope and Aims
- ---------------------

In general the reuse of of NLP data resources (such as lexicons or corpora)
has exceeded that of algorithmic resources (such as lemmatisers or parsers).
However, there are still two barriers to data resource reuse:

1)  each resource has its own representation syntax and corresponding
    programmatic access mode (e.g. SQL for CELEX, C or Prolog for Wordnet,
    SGML for the BNC);

2)  resources must generally be installed locally to be usable (and of
    course precisely how this happens, what operating systems are supported
    etc. varies from case to case).

The consequences of 1) are that although resources share some structure in
common (lexicons are organised around words, for example) this commonality
is wasted when it comes to using a new resource (the developer has to learn
everything afresh each time) and that work which seeks to investigate or
exploit commonalities between resources (e.g. to link several lexicons to an
ontology) has to first build a layer of access routines on top of each
resources. So, for example, if we wish to do task-based evaluation of lexicons
by measuring the relative performance of an information extraction system
with different instantiations of lexical resource, we might end up writing
code to translate several different resources into SQL or SGML.

The consequence of 2) is that there is no way to "try before you buy": no
way to examine a data resource for its suitability for your needs before
licencing it. Correspondingly there is no way for a resource provider to
expose limitted access to their products for advertising purposes, or gain
revenue through piecemeal supply of sections of a resource.

This workshop will discuss ways to overcome these barriers. The proposers
will discuss a new method for distributing and accessing language resources
involving the development of a common programmatic model of the various
resources types, implemented in CORBA IDL and/or Java, along with a
distributed server for non-local access. This model is being designed as
part of the GATE project (General Architecture for Text Engineering:
http://www.dcs.shef.ac.uk/research/groups/nlp/gate/) and goes under the
provisional title of an Active CREOLE Server. (CREOLE: Collection of REusable
Objects for Language Engineering. Currently CREOLE supports only algorithmic
objects, but will be extended to data objects.)

A common model of language data resources would be a set of
inheritance hierarchies making up a forest or set of graphs. At
the top of the hierarchies would be very general abstractions
from resources (e.g. lexicons are about words); at the leaves
would be data items that were specific to individual resources.
Programmatic access would be available at all levels, allowing
the developer to select an appropriate level of commonality for
each application.

Note that although an exciting element of the work could be to
provide algorithms to dynamically merge common resources what
we're suggesting initially is not to develop anything
substantively new, but simply to improve access to existing
resources. This is NOT a new standards initiative, but a way to
build on previous initiatives.

Of course, the production of a common model that fully expressed all the
subtleties of all resources would be a large undertaking, but we believe
that it can be done incrementally, with useful results at each stage. Early
versions will stop decomposing the object structure of resources at a fairly
high level, leaving the developer to handle the data structures native to
the resources at the leaves of the forest. There should still be a
substantial benefit in uniform access to higher level strucures.

Program Committee
- ---------------

Yorick Wilks
Hamish Cunningham
Wim Peters
Remi Zajac
Roberta Catizone
Paola Velardi
Maria Teresa Pazienza
Roberto Basili
Bran Boguraev
Sergei Nirenburg
James Pustejowsky
Ralph Grishman
Christiane Fellbaum

---------------------------------------------------------------------------
LINGUIST List: Vol-9-498