[Corpora-List] 2nd CfP: Workshop on Scalability in Natural Language Processing

Leon Derczynski leon at dcs.shef.ac.uk
Mon Jun 17 08:27:20 UTC 2013


****************************************************************
First Call for Papers

Workshop on Scalability in Natural Language Processing
https://sites.google.com/site/scanlp2013/

Full-day workshop in conjunction with RANLP 2013

Deadline: 3 July 2013, 23:59 Hawaii Time
****************************************************************

This workshop, held in conjunction with RANLP 2013, aims to introduce
contemporary work and to discuss novel methods for natural language
processing at a large scale, and explore how the resulting technology
and methods can be reused in applications both on the Web and in
the physical world.

DESCRIPTION

For a processing approach to be scalable, it should be to take on
large volumes of data; it can work through them at high speed; and
it can smoothly adapt to changes in these needs. We discuss this
in the context of NLP, with particular focus on the core tasks
of resource creation, discourse processing, and evaluation.

Now is a particularly important time to develop scalable methods
in our field. Big data is here and the benefits of effectively
getting through it remain to be harvested by the pioneers. Huge
datasets are becoming available: Google Books contains 155 billion
tokens, over which only shallow surveys have been conducted; the
new Common Crawl web corpus contains over 60 terabytes of text and
metadata. But size alone is not a driver for scalable methods -
the rapid text content creation we see every day presents masses
of data we are not yet equipped to handle. For example, Twitter
alone is responsible for 500 million microtexts every day; the
publicly-visible Wordpress.org holds a part of the 2 million
blog documents we create every 24 hours.

As well as big text data becoming prolific, demand for this data
is also high. The fast, un-curated nature of microtext has been
shown to be of value in stock valuation by multiple researchers.
User location and movement analysis enables powerful search and
analysis modes, such as computational journalism and powerful
personalisation. Sentiment detection informs corporations,
governance and political activities. Media monitoring requires
extracting and co-referring entities and events from thousands
of outlets in real time. And finally, the emerging field of
deep learning places but one core demand in all its guises:
large amounts of data. All these applications' pressures
create a demand for NLP that can be done quickly and broadly.

There is more demand than ever for scalable natural language
processing. Many organisations are interested in the potential
results as big data becomes better defined and data-intensive
approaches to computational linguistics reach production-level
performance. Enormous quantities of data, from user input to
news archives, are being mined using more powerful and
computationally demanding techniques. The organisation, variety,
integrity and public availability of the resulting resources will
have a major impact on how we continue to do science.

Newly introduced data-intensive approaches to computational
linguistics continue thrive on input volume; we need scalable
technology to handle the next order of magnitude in corpus
sizes and, given the nature of language, to continue
data-intensive advances in our field.

============================================================================
TOPICS OF INTEREST

With regard to Scalable NLP, we aim to encourage discussion
regarding three key areas of natural language processing:
resource creation; processing of discourse; and evaluation:

-- General scalability issues
-- Application approaches
-- Performance limits
-- Flexible resource creation
-- Parallelising annotation
-- Handling huge corpora
-- Crowdsourcing for corpus creation
-- Decomposing resource creation tasks
-- Rapid or realtime annotation quality assessment
-- Running NLP in the cloud
-- Privacy issues
-- NLP application optimisation / parallelisation
-- Scalable machine learning for NLP
-- High performance computing for NLP
-- Rapid evaluation
-- On-line learning for NLP
-- Reinforcement learning
-- Iterative and ensemble learning
-- Hypothesis generation

In addition to the invited talk and presentations, the
worskhop will include a 30-minute hands-on demonstration slot
with participants doing NLP in the cloud using GATECloud,
possibly including social media processing using GATE TwitIE
(supported and funded by the organisers).

============================================================================

IMPORTANT DATES

Submission deadline: 3 July 2013
Notification of acceptance: 2 August 2013
Camera-ready copies due: 16 August 2013
Workshop date: 12/13 September 2013


============================================================================

SUBMISSION

Submission is via SoftConf:

https://www.softconf.com/ranlp13/ScaNLP/

All submissions must be in PDF format and must follow the RANLP
template (http://lml.bas.bg/ranlp2013/submissions.php#styles)

Multiple submission policy: We welcome papers that are under review for
other venues, but, in the event of multiple acceptances, authors are
requested to notify us and choose which meeting to present and publish the
work at as soon as possible - we cannot accept for publication or
presentation work that will be (or has been) published elsewhere.

Reviewing: Reviewing will be blind. No information identifying the authors
should be in the paper: this includes not only the authors' names and
affiliations, but also self-references that reveal authors' identities; for
example, "We have previously shown (Smith 1999)" should be changed to "Smith
(1999) has previously shown".

Paper length and presentation: We invite long (8) and short (4) papers.
Accepted short papers will be presented either as short oral presentations
or as posters.

============================================================================

ORGANIZERS

Leon Derczynski, University of Sheffield, UK
Kalina Bontcheva, University of Sheffield, UK
Bin Yang, Aarhus University, Denmark
Valentin Tablan, University of Sheffield, UK
Arno Scharl, MODUL University Vienna, Austria
Thierry Declerck, DFKI, Germany

============================================================================

PROGRAMME COMMITTEE:

Galia Angelova, Bulgarian Academy of Sciences, Bulgaria
Srikanta Bedathur, Indraprastha Institute of Information Technology, India
Sebastien Bratieres, University of Cambridge, UK
Kai-wei Chang, University of Illinois Urbana-Champaign, USA
Freddy Chong-Tat Chua, Singapore Management University, Singapore
Trevor Cohn, University of Sheffield, UK
Hamish Cunningham, University of Sheffield, UK
David Martins de Matos, L2F INESC ID, Portugal
Ted Dunning, MapR Technologies, USA
Chris Dyer, Carnegie Mellon University, USA
Rainer Gemulla, Max Planck Institut für Informatik, Germany
Amit Goyal, University of Maryland, USA
Christian S. Jensen, Aarhus University, Denmark
Vinh Ngoc Khuc, Ohio State University, USA
Oleksandr Kolomiyets, KU Leuven, Belgium
Hector Llorens, Nuance, Spain
Barry Norton, Ontotext, UK
Miles Osborne, University of Edinburgh, UK
Weining Qian, East China Normal University, China
Alan Ritter, University of Washington, USA
Matthew Rowe, Lancaster University, UK
Marta Sabou, MODUL University Vienna, Austria
Sina Samangooei, University of Southampton, UK
Sebastian Schelter, TU Berlin / Apache Software Foundation, Germany
Darius Sidlauskas, Aarhus University, Denmark
Marc Spaniol, Max Planck Institut für Informatik, Germany
Andreas Vlachos, University of Cambridge, UK


============================================================================

SUPPORT

The ScaNLP workshop is partially supported by GATE, the EU FP7 projects
TrendMiner (http://www.trendminer-project.eu/) and AnnoMarket (
https://annomarket.eu/),
and the CHIST-ERA uComp (http://http://www.ucomp.eu/) project.


-- 
Leon R A Derczynski
Research Associate, NLP Group

Department of Computer Science
University of Sheffield
Regent Court, 211 Portobello
Sheffield S1 4DP, UK

+45 5157 4948
http://www.dcs.shef.ac.uk/~leon/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130617/22d8a45f/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list