[Corpora-List] Final Call for Participation: Word Sense Induction and Disambiguation for Graded Senses

David Alan Jurgens jurgens at di.uniroma1.it
Thu Feb 14 19:31:14 UTC 2013


Final Call for Participation:
Word Sense Induction and Disambiguation for Graded Senses

(SemEval-2013, Task 13)


http://www.cs.york.ac.uk/semeval-2013/task13/


In keeping with the strong tradition of Word Sense Disambiguation at
SenseEval and SemEval, we are pleased to invite participants to
SemEval-2013 Task 13 on word senses with graded applicability in context.
 Previous tasks on word senses have largely assumed that each usage of a
word is best labeled by a single sense.  In contrast, Task 13 proposes that
usages should be labeled by all senses that apply, with weights indicating
the degree of applicability.  This multi-sense labeling effectively
captures both cases where related senses from a fine-grain sense inventory
apply and where contextual ambiguity enables alternate interpretations.  We
illustrate this with three example sentences:

   - The student loaded paper into the printer
   - The student submitted her paper by email.
   - The student handed her paper to the teacher at the beginning of class

according to the first two senses of paper in WordNet 3.1:

   1. paper - a material made of cellulose pulp derived mainly from wood or
   rags or certain grasses
   2. paper - an essay, especially one written as an assignment

The first sentence refers to the material sense of paper, while the second
sentence refers to the essay sense of paper.  In contrast, both senses are
possible interpretations in the third sentence, though with different
degrees; here, the usage evokes separate properties of the concept of its
form (a cellulose material) and purpose (an assignment), which are
themselves distinct senses of paper.  Similar multi-label conditions may
also be constructed for word uses where a reader perceives multiple,
unrelated interpretations due to contextual ambiguity.  While most previous
work on WSD makes a best guess as to which interpretation is correct, Task
13 opts to make explicit the ambiguity explicit in the multi-sense labeling.

*Task*

Task 13 evaluates Word Sense Induction (WSI) and Unsupervised WSD systems
in two settings (1) a WSD task and for sense induction systems, (2) a
clustering comparison setting that evaluates the similarity of the sense
inventories.  Participants are presented examples contexts of each word and
asks the participants to label each usage with as many senses as they think
are applicable, along with numeric weights denoting the relative levels of
applicability.  Words will be balanced across part of speech and number of
senses, using nouns, verbs and adjectives.  In addition, the data set will
include several highly polysemous words (15+ senses) for each part of
speech.  Word senses are drawn from WordNet 3.1.


*Participation*

The focus of this task is on unsupervised systems and therefore we solicit
participation for two types of systems.  Following previous SemEval tasks
on WSI, we solicit systems that first learn the senses themselves and then
label the test data using their induced senses.  Second, we also solicit
Unsupervised WSD systems trained on WordNet 3.1 that will label using the
same sense inventory as the test data.

Both systems will be evaluated jointly on the first subtask using a series
of graded sense label comparisons.  Sense induction systems will also be
evaluated using unsupervised clustering measures.


*Data*

Because the task is for unsupervised WSD and WSI systems, no training data
is provided.  However, for WSI systems, the
ukWaC<http://wacky.sslmit.unibo.it/doku.php?id=corpora>corpus has been
specified as the official dataset from which senses have
been learned.  In contrast to past SemEval tasks on WSI, a significantly
larger corpus is being used to facilitate all-words WSI methods.

Trial data for the task has been released and includes an example dataset
of eight words for the first subtask, along with the evaluation measures.


*Important Dates*

Please note that interested parties should still register even if they
decline later to submit a system.

August 7, 2012, Trial Data 1.0 Released
November 1, 2012 onwards Start of evaluation period
January 10, 2013, Trial Data 1.1 Released
*February 15, 2013* Registration Deadline for Task Participants
March 15, 2013 End of evaluation period
April 9, 2013 Paper submission deadline (to be confirmed)


*Organizers
*
David Jurgens (lastname at di.uniroma1.it), Sapienza University of Rome, Italy
Ioannis Klapaftis (lastname at outlook.com), Microsoft (Bing), United Kingdom


*Contact*

In interested, please join the discussion on our Google Group:
https://groups.google.com/group/semeval-2013-task-13.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130214/bc3b8781/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list