11.1486, Calls: Comparing Corpora, NLP/Information Retrieval

The LINGUIST Network linguist at linguistlist.org
Fri Jul 7 16:00:54 UTC 2000


LINGUIST List:  Vol-11-1486. Fri Jul 7 2000. ISSN: 1068-4875.

Subject: 11.1486, Calls: Comparing Corpora, NLP/Information Retrieval

Moderators: Anthony Rodrigues Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Associate Editors:  Ljuba Veselinova, Stockholm U. <ljuba at linguistlist.org>
		    Scott Fults, E. Michigan U. <scott at linguistlist.org>
		    Jody Huellmantel, Wayne State U. <jody at linguistlist.org>
		    Karen Milligan, Wayne State U. <karen at linguistlist.org>

Assistant Editors:  Lydia Grebenyova, E. Michigan U. <lydia at linguistlist.org>
		    Naomi Ogasawara, E. Michigan U. <naomi at linguistlist.org>
		    James Yuells, Wayne State U. <james at linguistlist.org>

Software development: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
                      Sudheendra Adiga, Wayne State U. <sudhi at linguistlist.org>
                      Qian Liao, E. Michigan U. <qian at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded jointly by Eastern Michigan University,
Wayne State University, and donations from subscribers and publishers.


Editor for this issue: Jody Huellmantel <jody at linguistlist.org>
 ==========================================================================

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

=================================Directory=================================

1)
Date:  Thu, 6 Jul 2000 12:14:26 EDT
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  ACL'2000-Comparing Corpora Workshop

2)
Date:  Thu, 6 Jul 2000 12:23:40 EDT
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  ACL'2000-Recent Advances in NLP and IR Workshop

-------------------------------- Message 1 -------------------------------

Date:  Thu, 6 Jul 2000 12:14:26 EDT
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  ACL'2000-Comparing Corpora Workshop



			FINAL CALL FOR PAPERS

			    ACL Workshop:

		          COMPARING CORPORA

			     October 2000

	    Hong Kong University of Science and Technology


THEME
=====

Anyone who has worked with corpora will be all too aware of
differences between them.  Depending on the differences, it may, or
may not, be reasonable to expect results based on one corpus to also
be valid for another.  It may, or may not, be appropriate for a
grammar, or parser, based on one to perform well on another.  It may,
or may not, be straightforward to port an application from a domain of
the first text type to a domain of the second.  Currently,
characterisations of corpora are mostly textual and at different
levels of generality.  A corpus is described as ``Wall Street
Journal'' or ``transcripts of business meetings'' or ``foreign
learners' essays (intermediate grade)''.  It would be desirable to be
able to place a new corpus in relation to existing ones, and to be
able to quantify similarities and differences.

Allied to corpus-similarity is corpus-homogeneity. An understanding of
homogeneity is a prerequisite to a measure of the similarity -- it makes
little sense to compare a corpus sampled across many genres, like the
Brown, with a corpus of weather forecasts, without first accounting
for the one being broad, the other narrow.

Given the centrality of corpora to contemporary language engineering,
it is remarkable how little research there has been to date on the
question.  Biber's work, coming from sociolinguistics, has made a
considerable impact, with various researchers in computational
lingustics taking forward the model (Biber 1989, 1995).  Studies in
text classification, genre and sublanguage are also salient, but it is
rarely evident how well the technologies ddeveloped in these fields are
suited to measuring corpus similarity or homogeneity.

The workshop will welcome contributions concerned with measuring and
comparing corpora using quantitative methods, from any field.


Where and when
==============

The workshop will last half a day and will be on either 7th or 8th
Oct, the main ACL conference being 3rd-6th October.  The venue will be
the same.

Submissions:
============

Submissions are limited to original, unpublished work. Papers may
not exceed 3200 words (exclusive of title page and references).
They must be received by July 8, 2000, in hard copy (4 copies)
OR postscript OR rtf format.  Electronic delivery is to

compcorp at itri.brighton.ac.uk

and hard copies are to be mailed to

Compcorp submission
ITRI
University of Brighton
Lewes Road
Brighton BN2 4GJ
United Kingdom


Important Dates:
  July 8, 2000              Submission (of full-length paper)
  August 17, 2000           Acceptance notice
  September 1, 2000         Camera-ready paper received
  October 7 or 8            Workshop date

	
Co-ordinators
=============
	
Adam Kilgarriff - University of Brighton, UK
Tony Berber Sardinha - Catholic University of Sao Paulo, Brazil

Programme committee
===================

Douglas Biber           Northern Arizona University
Jeremy Clear            University of Birmingham
Ted Dunning             MusicMatch Software, Inc.
Tomaz Erjavec           Jozef Stefan Institute, Slovenia
Pascale Fung            University of Science and Technology, Hong Kong
Greg Grefenstette       Xerox Research Centre Europe
Benoit Habert           LIMSI, France
Przemek Kaszubski       Adam Mickiewicz University, Poland
Adam Kilgarriff         University of Brighton
David Lee               University of Lancaster
Oliver Mason            University of Birmingham
Doug Oard               University of Maryland
Tony Rose               Canon Research
Tony Berber Sardinha    Catholic University of Sao Paulo, Brazil
George Tambouratzis     ILSP, Athens
Christopher Tribble     King's College, London University

Website
=======

http://www.itri.bton.ac.uk/events/compcorp


-------------------------------- Message 2 -------------------------------

Date:  Thu, 6 Jul 2000 12:23:40 EDT
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  ACL'2000-Recent Advances in NLP and IR Workshop


- ------------------------------------------------------------------
                        SECOND CALL FOR PAPERS
 ACL'2000 Workshop on Recent Advances in Natural Language Processing
                      and Information Retrieval
                         October 7/8, 2000
             Hong Kong University of Science and Technology
- ------------------------------------------------------------------


Aims and scope
- ------------

This workshop aims at fostering the interaction between researchers in
the areas of Natural Language Processing (NLP) and Information
Retrieval (IR), and furthermore, promoting discussions on the current
and potential benefits of common approaches to related research
challenges. The central topic is the application of Language
Technologies to Information Retrieval, including (but not limited to):

* the role of lexical-syntactic information in mono- and multilingual
IR, including morphology, phrase detection and treatment, word sense
disambiguation adapted to IR needs, acquisition and use of lexical
resources, etc.

* empirical evidence regarding the use of NL techniques in different
retrieval scenarios, typification of such scenarios, and the
discussion of evaluation measures beyond precision/recall variants.

* interaction between NLP and IR techniques in topics related
to both areas such as Cross-Language and Interactive Text Retrieval,
Question Answering, Information Extraction, Text Summarization, Text
Data Mining, etc.

The growing research and application possibilities provided by the
increased amount of networked information have motivated new attempts
to explore the relationship between NLP and IR.  For researchers in
IR, a compelling challenge is to move from (monolingual) document
retrieval within controlled text collections, to actually retrieving
information, rather than individual documents, from multilingual,
heterogeneous and dynamic webs of interlinked documents and online
services. The reciprocal challenge for NLP research is to scale up,
adapt and possibly reshape techniques and resources to help bridge the
gap between document and information retrieval in practical
applications. Papers describing pragmatic, empirically tested
approaches facing these issues are especially welcome.

Instructions for submissions
- --------------------------

The format of submissions is identical to the one used for the main
conference, which can be found at http://www.cs.ust.hk/acl2000/fcfp.html.
Authors should fill the "paper ID" field in to specify: "IR&NLP
workshop". The "Topic Area" and "session" fields should be left blank.

Papers must be submitted electronically, in postscript or pdf formats,
to both Program Chairs:

Judith Klavans, Columbia University
klavans at cs.columbia.edu

Julio Gonzalo, UNED
julio at ieec.uned.es

No hardcopy submission is required.

Program Committee
- ---------------

Judith Klavans, Columbia University (Co-Chair)
Julio Gonzalo, UNED (Co-Chair)

Jamie Callan (CMU)
Bruce Croft (CIIR)
Eric Gaussieur (Rank Xerox Grenoble)
Eduard Hovy (ISI/USC)
Christian Jacquemin (LIMSI)
Noriko Kando (NII Tokio)
Bob Krovetz (NEC Princeton)
Mun-Kew Leong (Kent Ridge Digital Labs)
Carol Peters (IEI-CNR)
Mark Sanderson (Univ. of Sheffield)
Tomek Strlkowski (GE)
Evelyne Tzoukermann (Lucent Technologies)
Felisa Verdejo (UNED)
Nina Wacholder (Columbia University)

Important dates
- -------------

Deadline for submissions: July 15, 2000
Notification of acceptance: August 7, 2000
Camera-ready version: September 1, 2000
Workshop: October 7 or 8, 2000

Further Information
- -----------------

Updated information about the workshop can be found at
http://sensei.ieec.uned.es/IRNLP-2000

---------------------------------------------------------------------------
LINGUIST List: Vol-11-1486



More information about the LINGUIST mailing list