Corpora: FINAL CALL: 'Web as Corpus' Special Issue of Computational Linguistics

Adam Kilgarriff adam.kilgarriff at itri.brighton.ac.uk
Wed Apr 17 11:29:12 UTC 2002


http://www.itri.bton.ac.uk/~Adam.Kilgarriff/wac_cfp.html


                          FINAL CALL FOR PAPERS
              SPECIAL ISSUE of COMPUTATIONAL LINGUISTICS
                            Web as Corpus


Guest editors

Adam Kilgarriff,      ITRI, University of Brighton
Gregory Grefenstette, Clairvoyance Corporation


The Web is an immense, multilingual, freely available corpus. As with
other large new corpora, computational linguists have been stimulated
by its presence.  Web research includes many of the most talked about
papers of recent ACL and other meetings (eg Resnik, ACL '99; Brill,
"Does the web change everything?", ACL SIGNLL '01).

In comparison with most corpora studied to date, the web is
heterogeneous and noisy. Methods for handling the noise, and
extracting and exploiting subcorpora meeting particular criteria, are
being developed by a widening population ranging from students who
realise that it is an obvious place to obtain their corpus for free,
to companies who seek to use HLT techniques on datasets other than the
ones HLT researchers usually use.

NLP can both give to, and take from, the web (distinction due to
Dragomir Radev). It can give to the web technologies such as
summarisation, MT and question-answering. But the giving side of the
equation looks only at short-to-medium term goals. For the longer
term, for 'giving' as well as for other purposes, a deeper
understanding of the linguistic nature of the web and its potential
for CL/NLP is required. For that, we must take the web itself, in
whatever limited way, as an object of study, and uncover what it has
to tell us about the nature of language. The Special Issue will focus
on how we can use the web, rather than how we can help web users.

The issues which we will expect Special Issue papers to cover include:

      Lexical data derived from the Web
      Classifying Web language; the range of text types on the Web
      Mapping Web documents onto existing ontologies;
                          implications for ontologies
      Clustering in an open corpus
      The multilingual Web as a resource for translation
      CL/HLT engagement with the Semantic Web


Papers should meet the usual criteria for CL; we expect most
submissions to be short papers (up to 15 journal pages, ca 4000 words)
but long papers (15--30 pages, ca 8000 words) are also permissible.

SCHEDULE

Papers due: 30 April 2002

SUBMISSION PROCEDURE

Submissions may be either hard copy or soft copy.

Soft copy submissions must meet Computational Linguistics
specifications, see CL formatting instructions at

  http://www.itri.bton.ac.uk/~Adam.Kilgarriff/cl-format.txt

and are to be sent to Adam.Kilgarriff at itri.brighton.ac.uk.

For hard copy submissions, seven copies are to be sent to

      Adam Kilgarriff
      Web as Corpus Special Issue
      ITRI
      University of Brighton
      Lewes Road
      Brighton BN2 4GJ
      United Kingdom

In this case authors are also requested to submit a soft copy, in ps,
pdf or rtf, to Adam.Kilgarriff at itri.brighton.ac.uk.

Questions about submissions should be directed to the two Guest
Editors, rather than the Journal or Publishing Editors.



More information about the Corpora mailing list