22.720, Calls: Semantics, Computational Ling/USA

Sat Feb 12 02:37:07 UTC 2011

LINGUIST List: Vol-22-720. Fri Feb 11 2011. ISSN: 1068 - 4875.

Subject: 22.720, Calls: Semantics, Computational Ling/USA

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin-Madison  
Monica Macaulay, U of Wisconsin-Madison  
Eric Raimy, U of Wisconsin-Madison  
Joseph Salmons, U of Wisconsin-Madison  
Anja Wanner, U of Wisconsin-Madison  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Amy Brunett <brunett at linguistlist.org>
================================================================  

LINGUIST is pleased to announce the launch of an exciting new feature:  
Easy Abstracts! Easy Abs is a free abstract submission and review facility 
designed to help conference organizers and reviewers accept and process 
abstracts online.  Just go to: http://www.linguistlist.org/confcustom, 
and begin your conference customization process today! With Easy Abstracts, 
submission and review will be as easy as 1-2-3!

===========================Directory==============================  

1)
Date: 09-Feb-2011
From: Eugenie Giesbrecht [giesbrecht at fzi.de]
Subject: Distributional Semantics and Compositionality Workshop

-------------------------Message 1 ---------------------------------- 
Date: Fri, 11 Feb 2011 21:36:02
From: Eugenie Giesbrecht [giesbrecht at fzi.de]
Subject: Distributional Semantics and Compositionality Workshop

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=22-720.html&submissionid=4495695&topicid=3&msgnumber=1

Full Title: Distributional Semantics and Compositionality Workshop 
Short Title: DiSCo'2011 @ ACL/HLT 

Date: 24-Jun-2011 - 24-Jun-2011
Location: Portland, Oregon, USA 
Contact Person: Eugenie Giesbrecht
Meeting Email: disco2011workshop at gmail.com
Web Site: http://disco2011.fzi.de/ 

Linguistic Field(s): Computational Linguistics; Semantics 

Call Deadline: 01-Apr-2011 

Meeting Description:

ACL/HLT Workshop on Distributional Semantics and Compositionality (DiSCo'2011)
http://disco2011.fzi.de/
June 24, 2011, Portland, Oregon, USA

Any NLP system that does semantic processing relies on the assumption of semantic compositionality: the meaning of a phrase is determined by the meanings of its parts and their combination. However, this assumption does not hold for lexicalized phrases such as idiomatic expressions, which causes pain points not only for semantic, but also for syntactic processing, (see Sag et al. 2001). In particular, while distributional methods in semantics have proved to be very efficient in tackling a wide range of tasks in natural language processing, e.g., document retrieval, clustering and classification, question answering, query expansion, word similarity, synonym extraction, relation extraction, textual advertisement matching in search engines, etc. (see Turney and Pantel 2010 for a detailed overview), they are still strongly limited by being inherently word-based. While dictionaries and other lexical resources contain multiword entries, these are expensive to obtain, not available for all languages to a sufficient extent, the definition of a multiword varies across resources and non-compositional phrases are merely a subclass of multiwords. The workshop brings together researchers that are interested in extracting non-compositional phrases from large corpora by applying distributional models that assign a graded compositionality score to a phrase as well as researchers interested in expressing compositional meaning with such models. This score denotes the extent to which the compositionality assumption holds for a given expression. The latter can be used, for example, to decide whether the phrase should be treated as a single unit in applications. We emphasize that the focus is on automatically acquiring semantic compositionality. Approaches that employ prefabricated lists of non-compositional phrases should consider a different venue.

This event consists of a main session and a shared task.

References:

Ivan A Sag, Timothy Baldwin, Francis Bond, Ann Copestake, Dan Flickinger (2001): Multiword 
Expressions: A Pain in the Neck for NLP. In Proc. of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), Mexico City, Mexico

Turney, P. and P. Pantel. (2010). From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141-188. 

2nd Call for Papers:

Test data release:  March 31, 2011
Regular paper submission deadline: April 1, 2011
Test data submission and system description deadline: April 8, 2011
Notification of acceptance: Apr 25, 2011
Camera-ready deadline: May 06, 2011

For the main session, we invite submission of papers on the topic of automatically acquiring a model for semantic compositionality. This includes, but is not limited to:

- Models of Distributional Similarity
- Graph-based models over word spaces
- Vector-space models for distributional semantics
- Applications of semantic compositionality
- Evaluation of semantic compositionality

Authors are invited to submit papers on original, unpublished work in the topic area of this workshop. In addition to long papers presenting completed work, we also invite short papers and demos:

- Long papers should present completed work and should not exceed 8 pages plus 1 page of references

- Short papers/demos can present work in progress or the description of a system, and should not exceed 4 pages plus 1 page of references.

As reviewing will be blind, please ensure that papers are anonymous. The papers should not include the authors' names and affiliations or any references to web sites, project names etc., revealing the authors' identity.

Shared Task: Call for Participation

The organizers extracted candidate phrases from two large-scale freely available web-corpora, UkWaC and DeWaC (cf. http://wacky.sslmit.unibo.it/), containing respectively English and German POS tagged text. These data have been manually evaluated for compositionality with Amazon Turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. 4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each.

Phrases consist of two lemmas and come in three grammatical relations:

- ADJ_NN: adjective modifying a noun
- V_SUBJ: noun as a subject of a verb
- V_OBJ: noun as an object of a verb

Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency. The complete data was split into 40% training, 10% validation and 50% test. 

More details on the data set as well as the download link to the training and validation data are available from the workshop's website (http://disco2011.fzi.de/) 

Participants of the task are free to choose whatever method and data resources they will use in their submission. Prefabricated lists of multiwords are not allowed. Since the data set is derived from the WaCkY corpora, participants are strongly encouraged to use these freely available text collections to build their models of compositionality, thus ensuring the highest possible comparability of results. Furthermore, since the WaCkY corpora are provided already POS tagged and lemmatized, the workload on the participants' side is considerably reduced. This information (POS tags and lemmatization) may or may not be used by the participants. If needed, additional linguistic annotations or processing may also be added to the corpora. For obtaining the WaCky corpora, please email us (disco2011workshop at gmail.com) for instructions to minimize load on the WaCky organizers. Of course, you can also directly contact the WaCky community at http://wacky.sslmit.unibo.it/doku.php?id=start.

Participants need to further submit a 4 page system description for publication in the workshop volume.

Program Committee:

- Enrique Alfonseca, Google Research, Switzerland
- Tim Baldwin, University of Melbourne, Australia
- Marco Baroni, University of Trento, Italy
- Paul Buitelaar, National University of Ireland, Ireland
- Chris Brockett, Microsoft Research, Redmond, US
- Tim van de Cruys, INRIA, France
- Stefan Evert, University of Osnabrück, Germany
- Antske Fokkens, Saarland University, Germany
- Silvana Hartmann, TU Darmstadt, Germany
- Alfio Massimiliano Gliozzo, IBM, Hawthorne, NY, USA
- Mirella Lapata, University of Edinburgh, UK
- Ted Pedersen, University of Minnesota, Duluth, USA
- Yves Peirsman, Stanford University, USA
- Peter D. Turney, National Research Council Canada, Canada
- Magnus Sahlgren, Gavagai, Sweden
- Serge Sharoff, University of Leeds, UK
- Anders Søgaard, University of Copenhagen, Denmark
- Daniel Sonntag, German Research Center for AI, Germany
- Diana McCarthy, Lexical Computing Ltd., UK
- Dominic Widdows, Google, USA

Workshop Chairs:

- Chris Biemann, San Francisco, USA
- Eugenie Giesbrecht, FZI Research Center for Information Technology at the University of Karlsruhe, Germany
- Emiliano Guevara, Institute for Linguistics and Scandinavian Studies, University of Oslo, Norway

Contact email: disco2011workshop @ gmail.com

-----------------------------------------------------------
LINGUIST List: Vol-22-720	
----------------------------------------------------------