[Corpora-List] Deadline extension: CFP: 7th Web as Corpus Workshop (WAC-7)
Serge Sharoff
s.sharoff at leeds.ac.uk
Sun Jan 22 20:52:05 UTC 2012
*Deadline extension*
* Submission by *January 29, 2012* to be made through
https://www.easychair.org/conferences/?conf=wac7
* Notification of acceptance by February 6
* Camera-ready copy due February 15
To be held in association with WWW2012 in Lyon, France, 17th April 2012
Sponsored by ACL SIGWAC, http://www.sigwac.org.uk
More and more people are using Web data for linguistic and NLP research:
the Web provides an easy
source of linguistic data in a great variety of languages. However, a
‘crawl’ is not ready for exploration
in the same way a traditional ‘corpus’ is. We need to turn a crawl into
a corpus. The workshop, the seventh
in an annual series, provides a venue for exploring what it involves,
how to do it, and what we find out if we do.
We invite submissions which:
* describe Web corpus collection projects, or modules for one part of
the process (crawling, filtering, de-duplication, language-id,
tokenising, indexing, ...)
* explore characteristics of Web data from a linguistics/NLP
perspective including registers, domains, frequency distributions,
comparisons between datasets
* use crawled Web data for NLP purposes (with emphasis on the data
rather than the use)
The previous WAC workshops have been co-located with various conferences
in computational linguistics. This time the workshop co-locates with
WWW2012, the main world conference on the Web technologies and their
impact on the society.
== Organising committee ==
* Adam Kilgarriff (Lexical Computing Ltd.)
* Jan Pomikalek (Masaryk University)
* Serge Sharoff (University of Leeds, Workshop Chair)
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list