Appel: Workshop on Scalability in Natural Language Processing

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Apr 10 10:04:10 UTC 2013

Date: Tue, 9 Apr 2013 23:40:53 +0200
From: Leon Derczynski <leon at>
Message-ID: <CAPjwwFr=fVsqtBJW0DpxB8j4GOdZZUkQeGn+mvkU9s_cSu6YGg at>

First Call for Papers

Workshop on Scalability in Natural Language Processing

Full-day workshop in conjunction with RANLP 2013

Deadline: 3 July 2013, 23:59 Hawaii Time

This workshop, held in conjunction with RANLP 2013, aims to introduce
contemporary work and to discuss novel methods for natural language
processing at a large scale, and explore how the resulting technology
and methods can be reused in applications both on the Web and in the
physical world.


For a processing approach to be scalable, it should be to take on large
volumes of data; it can work through them at high speed; and it can
smoothly adapt to changes in these needs. We discuss this in the context
of NLP, with particular focus on the core tasks of resource creation,
discourse processing, and evaluation.

Now is a particularly important time to develop scalable methods in our
field. Big data is here and the benefits of effectively getting through
it remain to be harvested by the pioneers. Huge datasets are becoming
available: Google Books contains 155 billion tokens, over which only
shallow surveys have been conducted; the new Common Crawl web corpus
contains over 60 terabytes of text and metadata. But size alone is not a
driver for scalable methods - the rapid text content creation we see
every day presents masses of data we are not yet equipped to handle. For
example, Twitter alone is responsible for 500 million microtexts every
day; the publicly-visible holds a part of the 2 million
blog documents we create every 24 hours.

As well as big text data becoming prolific, demand for this data is also
high. The fast, un-curated nature of microtext has been shown to be of
value in stock valuation by multiple researchers.  User location and
movement analysis enables powerful search and analysis modes, such as
computational journalism and powerful personalisation. Sentiment
detection informs corporations, governance and political
activities. Media monitoring requires extracting and co-referring
entities and events from thousands of outlets in real time. And finally,
the emerging field of deep learning places but one core demand in all
its guises: large amounts of data. All these applications' pressures
create a demand for NLP that can be done quickly and broadly.

There is more demand than ever for scalable natural language
processing. Many organisations are interested in the potential results
as big data becomes better defined and data-intensive approaches to
computational linguistics reach production-level performance. Enormous
quantities of data, from user input to news archives, are being mined
using more powerful and computationally demanding techniques. The
organisation, variety, integrity and public availability of the
resulting resources will have a major impact on how we continue to do

Newly introduced data-intensive approaches to computational linguistics
continue thrive on input volume; we need scalable technology to handle
the next order of magnitude in corpus sizes and, given the nature of
language, to continue data-intensive advances in our field.


With regard to Scalable NLP, we aim to encourage discussion regarding
three key areas of natural language processing: resource creation;
processing of discourse; and evaluation:

-- General scalability issues
-- Application approaches
-- Performance limits
-- Flexible resource creation
-- Parallelising annotation
-- Handling huge corpora
-- Crowdsourcing for corpus creation
-- Decomposing resource creation tasks
-- Rapid or realtime annotation quality assessment
-- Running NLP in the cloud
-- Privacy issues
-- NLP application optimisation / parallelisation
-- Scalable machine learning for NLP
-- High performance computing for NLP
-- Rapid evaluation
-- On-line learning for NLP
-- Reinforcement learning
-- Iterative and ensemble learning
-- Hypothesis generation

In addition to the invited talk and presentations, the worskhop will
include a 30-minute hands-on demonstration slot with participants doing
NLP in the cloud using GATECloud, possibly including social media
processing using GATE TwitIE (supported and funded by the organisers).



Submission deadline: 5 July 2013
Notification of acceptance: 2 August 2013
Camera-ready copies due: 16 August 2013
Workshop date: 12/13 September 2013



Submission is via EasyChair:

All submissions must be in PDF format and must follow the RANLP template

Multiple submission policy: We welcome papers that are under review for
other venues, but, in the event of multiple acceptances, authors are
requested to notify us and choose which meeting to present and publish
the work at as soon as possible - we cannot accept for publication or
presentation work that will be (or has been) published elsewhere.

Reviewing: Reviewing will be blind. No information identifying the
authors should be in the paper: this includes not only the authors'
names and affiliations, but also self-references that reveal authors'
identities; for example, "We have previously shown (Smith 1999)" should
be changed to "Smith (1999) has previously shown".

Paper length and presentation: We invite long (8) and short (4) papers.
Accepted short papers will be presented either as short oral
presentations or as posters.



Leon Derczynski, University of Sheffield, UK
Kalina Bontcheva, University of Sheffield, UK
Bin Yang, Aarhus University, Denmark
Valentin Tablan, University of Sheffield, UK
Arno Scharl, MODUL University Vienna, Austria
Thierry Declerck, DFKI, Germany



Galia Angelova, Bulgarian Academy of Sciences, Bulgaria
Srikanta Bedathur, Indraprastha Institute of Information Technology, India
Kai-wei Chang, University of Illinois Urbana-Champaign, USA
Freddy Chong-Tat Chua, Singapore Management University, Singapore
Hamish Cunningham, University of Sheffield, UK
David Martins de Matos, L2F INESC ID, Portugal
Ted Dunning, MapR Technologies, USA
Chris Dyer, Carnegie Mellon University, USA
Rainer Gemulla, Max Planck Institut für Informatik, Germany
Amit Goyal, University of Maryland, USA
Christian S. Jensen, Aarhus University, Denmark
Vinh Ngoc Khuc, Ohio State University, USA
Oleksandr Kolomiyets, KU Leuven, Belgium
Hector Llorens, Nuance, Spain
Barry Norton, Ontotext, UK
Miles Osborne, University of Edinburgh, UK
Weining Qian, East China Normal University, China
Alan Ritter, University of Washington, USA
Matthew Rowe, Lancaster University, UK
Marta Sabou, MODUL University Vienna, Austria
Sina Samangooei, University of Southampton, UK
Sebastian Schelter, TU Berlin / Apache Software Foundation, Germany
Darius Sidlauskas, Aarhus University, Denmark
Marc Spaniol, Max Planck Institut für Informatik, Germany
Andreas Vlachos, University of Cambridge, UK



The ScaNLP workshop is partially supported by GATE, the EU FP7 projects
TrendMiner ( and AnnoMarket
(, and the CHIST-ERA uComp
(http:// project.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list