<div dir="ltr"><div>****************************************************************</div><div>First Call for Papers</div><div><br></div><div>Workshop on Scalability in Natural Language Processing</div><div><a href="https://sites.google.com/site/scanlp2013/">https://sites.google.com/site/scanlp2013/</a></div>
<div><br></div><div>Full-day workshop in conjunction with RANLP 2013</div><div><br></div><div>Deadline: 3 July 2013, 23:59 Hawaii Time</div><div>**************************************************************** </div><div>
<br></div><div>This workshop, held in conjunction with RANLP 2013, aims to introduce </div><div>contemporary work and to discuss novel methods for natural language </div><div>processing at a large scale, and explore how the resulting technology </div>
<div>and methods can be reused in applications both on the Web and in </div><div>the physical world. </div><div><br></div><div>DESCRIPTION</div><div><br></div><div>For a processing approach to be scalable, it should be to take on </div>
<div>large volumes of data; it can work through them at high speed; and </div><div>it can smoothly adapt to changes in these needs. We discuss this </div><div>in the context of NLP, with particular focus on the core tasks </div>
<div>of resource creation, discourse processing, and evaluation.</div><div><br></div><div>Now is a particularly important time to develop scalable methods </div><div>in our field. Big data is here and the benefits of effectively </div>
<div>getting through it remain to be harvested by the pioneers. Huge </div><div>datasets are becoming available: Google Books contains 155 billion </div><div>tokens, over which only shallow surveys have been conducted; the </div>
<div>new Common Crawl web corpus contains over 60 terabytes of text and </div><div>metadata. But size alone is not a driver for scalable methods - </div><div>the rapid text content creation we see every day presents masses </div>
<div>of data we are not yet equipped to handle. For example, Twitter </div><div>alone is responsible for 500 million microtexts every day; the </div><div>publicly-visible Wordpress.org holds a part of the 2 million </div>
<div>blog documents we create every 24 hours.</div><div><br></div><div>As well as big text data becoming prolific, demand for this data </div><div>is also high. The fast, un-curated nature of microtext has been </div><div>
shown to be of value in stock valuation by multiple researchers. </div><div>User location and movement analysis enables powerful search and </div><div>analysis modes, such as computational journalism and powerful </div><div>
personalisation. Sentiment detection informs corporations, </div><div>governance and political activities. Media monitoring requires </div><div>extracting and co-referring entities and events from thousands </div><div>of outlets in real time. And finally, the emerging field of </div>
<div>deep learning places but one core demand in all its guises: </div><div>large amounts of data. All these applications' pressures </div><div>create a demand for NLP that can be done quickly and broadly.</div><div><br>
</div><div>There is more demand than ever for scalable natural language </div><div>processing. Many organisations are interested in the potential </div><div>results as big data becomes better defined and data-intensive </div>
<div>approaches to computational linguistics reach production-level </div><div>performance. Enormous quantities of data, from user input to </div><div>news archives, are being mined using more powerful and </div><div>computationally demanding techniques. The organisation, variety,</div>
<div>integrity and public availability of the resulting resources will</div><div>have a major impact on how we continue to do science.</div><div><br></div><div>Newly introduced data-intensive approaches to computational </div>
<div>linguistics continue thrive on input volume; we need scalable </div><div>technology to handle the next order of magnitude in corpus </div><div>sizes and, given the nature of language, to continue </div><div>data-intensive advances in our field.</div>
<div><br></div><div>============================================================================</div><div>TOPICS OF INTEREST</div><div><br></div><div>With regard to Scalable NLP, we aim to encourage discussion </div><div>
regarding three key areas of natural language processing: </div><div>resource creation; processing of discourse; and evaluation:</div><div><br></div><div><span class="" style="white-space:pre"> </span>-- General scalability issues</div>
<div><span class="" style="white-space:pre"> </span>-- Application approaches</div><div><span class="" style="white-space:pre"> </span>-- Performance limits</div><div><span class="" style="white-space:pre"> </span>-- Flexible resource creation</div>
<div><span class="" style="white-space:pre"> </span>-- Parallelising annotation</div><div><span class="" style="white-space:pre"> </span>-- Handling huge corpora</div><div><span class="" style="white-space:pre"> </span>-- Crowdsourcing for corpus creation</div>
<div><span class="" style="white-space:pre"> </span>-- Decomposing resource creation tasks</div><div><span class="" style="white-space:pre"> </span>-- Rapid or realtime annotation quality assessment</div><div><span class="" style="white-space:pre"> </span>-- Running NLP in the cloud</div>
<div><span class="" style="white-space:pre"> </span>-- Privacy issues</div><div><span class="" style="white-space:pre"> </span>-- NLP application optimisation / parallelisation</div><div><span class="" style="white-space:pre"> </span>-- Scalable machine learning for NLP</div>
<div><span class="" style="white-space:pre"> </span>-- High performance computing for NLP</div><div><span class="" style="white-space:pre"> </span>-- Rapid evaluation</div><div><span class="" style="white-space:pre"> </span>-- On-line learning for NLP</div>
<div><span class="" style="white-space:pre"> </span>-- Reinforcement learning</div><div><span class="" style="white-space:pre"> </span>-- Iterative and ensemble learning</div><div><span class="" style="white-space:pre"> </span>-- Hypothesis generation</div>
<div><br></div><div>In addition to the invited talk and presentations, the </div><div>worskhop will include a 30-minute hands-on demonstration slot </div><div>with participants doing NLP in the cloud using GATECloud, </div>
<div>possibly including social media processing using GATE TwitIE </div><div>(supported and funded by the organisers).</div><div><br></div><div>============================================================================</div>
<div><br></div><div>IMPORTANT DATES</div><div><br></div><div>Submission deadline: 5 July 2013</div><div>Notification of acceptance: 2 August 2013</div><div>Camera-ready copies due: 16 August 2013</div><div>Workshop date: 12/13 September 2013</div>
<div><br></div><div><br></div><div>============================================================================</div><div><br></div><div>SUBMISSION</div><div><br></div><div>Submission is via EasyChair: </div><div><br></div>
<div><a href="https://www.easychair.org/conferences/?conf=scanlp2013">https://www.easychair.org/conferences/?conf=scanlp2013</a></div><div><br></div><div>All submissions must be in PDF format and must follow the RANLP </div>
<div>template (<a href="http://lml.bas.bg/ranlp2013/submissions.php#styles">http://lml.bas.bg/ranlp2013/submissions.php#styles</a>) </div><div><br></div><div>Multiple submission policy: We welcome papers that are under review for</div>
<div>other venues, but, in the event of multiple acceptances, authors are</div><div>requested to notify us and choose which meeting to present and publish the</div><div>work at as soon as possible - we cannot accept for publication or</div>
<div>presentation work that will be (or has been) published elsewhere.</div><div><br></div><div>Reviewing: Reviewing will be blind. No information identifying the authors</div><div>should be in the paper: this includes not only the authors' names and</div>
<div>affiliations, but also self-references that reveal authors' identities; for</div><div>example, "We have previously shown (Smith 1999)" should be changed to "Smith</div><div>(1999) has previously shown".</div>
<div><br></div><div>Paper length and presentation: We invite long (8) and short (4) papers.</div><div>Accepted short papers will be presented either as short oral presentations</div><div>or as posters.</div><div><br></div>
<div>============================================================================</div><div><br></div><div>ORGANIZERS</div><div><br></div><div>Leon Derczynski, University of Sheffield, UK</div><div>Kalina Bontcheva, University of Sheffield, UK</div>
<div>Bin Yang, Aarhus University, Denmark</div><div>Valentin Tablan, University of Sheffield, UK</div><div>Arno Scharl, MODUL University Vienna, Austria</div><div>Thierry Declerck, DFKI, Germany</div><div><br></div><div>============================================================================</div>
<div><br></div><div>PROGRAMME COMMITTEE:</div><div><br></div><div>Galia Angelova, Bulgarian Academy of Sciences, Bulgaria</div><div>Srikanta Bedathur, Indraprastha Institute of Information Technology, India</div><div>Kai-wei Chang, University of Illinois Urbana-Champaign, USA</div>
<div>Freddy Chong-Tat Chua, Singapore Management University, Singapore</div><div>Hamish Cunningham, University of Sheffield, UK</div><div>David Martins de Matos, L2F INESC ID, Portugal</div><div>Ted Dunning, MapR Technologies, USA</div>
<div>Chris Dyer, Carnegie Mellon University, USA</div><div>Rainer Gemulla, Max Planck Institut für Informatik, Germany</div><div>Amit Goyal, University of Maryland, USA</div><div>Christian S. Jensen, Aarhus University, Denmark</div>
<div>Vinh Ngoc Khuc, Ohio State University, USA</div><div>Oleksandr Kolomiyets, KU Leuven, Belgium</div><div>Hector Llorens, Nuance, Spain</div><div>Barry Norton, Ontotext, UK</div><div>Miles Osborne, University of Edinburgh, UK</div>
<div>Weining Qian, East China Normal University, China</div><div>Alan Ritter, University of Washington, USA</div><div>Matthew Rowe, Lancaster University, UK</div><div>Marta Sabou, MODUL University Vienna, Austria</div><div>
Sina Samangooei, University of Southampton, UK</div><div>Sebastian Schelter, TU Berlin / Apache Software Foundation, Germany</div><div>Darius Sidlauskas, Aarhus University, Denmark</div><div>Marc Spaniol, Max Planck Institut für Informatik, Germany</div>
<div>Andreas Vlachos, University of Cambridge, UK</div><div><br></div><div><br></div><div>============================================================================</div><div><br></div><div>SUPPORT</div><div><br></div><div>
The ScaNLP workshop is partially supported by GATE, the EU FP7 projects </div><div>TrendMiner (<a href="http://www.trendminer-project.eu/">http://www.trendminer-project.eu/</a>) and AnnoMarket (<a href="https://annomarket.eu/">https://annomarket.eu/</a>), </div>
<div>and the CHIST-ERA uComp (http://<a href="http://www.ucomp.eu/">http://www.ucomp.eu/</a>) project.</div><div><br></div>-- <br>Leon R A Derczynski<br>Research Associate, NLP Group<br><br>Department of Computer Science<br>
University of Sheffield<br>Regent Court, 211 Portobello<br>Sheffield S1 4DP, UK<br><br>+45 5157 4948<br><a href="http://www.dcs.shef.ac.uk/~leon/" target="_blank">http://www.dcs.shef.ac.uk/~leon/</a>
</div>