[Corpora-List] Fwd: CfP: Concept Extraction Challenge at the 3rd Workshop on Making Sense of Microposts (#MSM2013) @WWW2013 - chance to win $1500
Andrea Varga
andrea.job06 at yahoo.com
Thu Jan 17 23:50:43 UTC 2013
apologies for cross-posting ========================================================================== Concept Extraction Challenge @ the 3rd Workshop on Making Sense of Microposts (#MSM2013) at WWW 2013 http://oak.dcs.shef.ac.uk/msm2013/challenge.html 13th May 2013. Rio de Janeiro, Brazil =========================================================================== #MSM2013 will host a 'Concept Extraction Challenge', with a prize
sponsored by eBay, where participants must label Microposts in a
given dataset with the concepts referenced. Existing concept extraction
tools are intended for use over news corpora and similar document-based
corpora with relatively long length. The aim of the challenge is to
foster research into novel, more accurate concept extraction for
(much shorter) Micropost data. The goal of the challenge is to detect concepts contained in Microposts.
Concepts are defined as abstract notions of things; for this challenge we
are constraining the task to the extraction of entity concepts
characterised by an entity type and an entity value. We consider four
entity types defined as follows: 1. Person (PER) - references in the Micropost to a full or partial person
name.
Example: Obama responds to diversity criticism
Extracted instances: PER/Obama; 2. Location (LOC) - references in the Micropost to full or partial location
names including: cities, provinces or states, countries, continents and
(physical) facilities.
Example: Finally on the train to London ahhhh
Extracted instances: LOC/London; 3. Organisation (ORG) - references in the Micropost to full or partial
organisation names including academic, state, governmental, military and
business or enterprise organisations.
Example:
NASA's Donated Spy Telescopes May Aid Dark Energy Search
Extracted instances: ORG/NASA; 4. Miscellaneous (MISC) - references in the Micropost to a concept not
covered
by any of the categories above, but limited to one of the entity types:
film/movie, entertainment award event, political event, programming
language,
sporting event, TV show, nationality, and (spoken or written) language.
Example: Okay, now this is getting seriously bizarre. Like a Monty Python script gone wrong.
Extracted Instances: MISC/Monty Python; DATASET
-----
Two datasets covering a variety of topics of discussion have been provided:
one for training and one for testing. The complete dataset (both
training and
testing data) contains 4265 manually annotated microposts using the above
definitions. The dataset is split by 60%/40% for training and testing. Training Dataset
-----
A tab-separated data with the following element indices per micropost:
- Element 1: The numeric ID of the micropost
- Element 2: The concepts found within the micropost, described by an
entity
type and an entity instance. These are semi-colon separated values
(e.g. PER/Obama;ORG/NASA).
- Element 3: The content of the micropost - this is what the concepts were
detected and extracted from. Test Dataset
-----
Also tab-separated data, but unlike the training dataset the concepts have
not been extracted:
-Element 1: The numeric ID of the micropost
-Element 2: The content of the micropost, this is what you must use to
detect
and extract the concepts contained. Anonymisation and Special Terms
-----
To ensure anonymity all username mentions in the microposts have been
replaced
with '_Mention_', and all URLs with '_URL_'. Data Access
-----
The datasets can be downloaded from:
http://oak.dcs.shef.ac.uk/msm2013/ie_challenge EVALUATION
------------
In order to evaluate your submissions we require you to submit (along
with a
paper describing your approach) a tab-separated value (TSV) file with the
following format for the microposts in the test dataset:
-Element 1: The numeric ID of the micropost.
-Element 2: The entity type and entity instance detected in each
micropost. These
are semi-colon separated values (e.g. PER/Obama;ORG/NASA). For instance, your results would be formatted as:
2560 PER/Obama;ORG/NASA
2561
2562 ORG/FDA;
… This file will be parsed and the accuracy of each approach computed.
Accuracy
will be judged using the f-measure (with beta = 1 so precision and
recall are weighted
equally). This will be computed on a per entity-type/entity-instance
pair basis and
then averaged across the four entity types. We will also provide
entity-type specific
f-measure values for each team to assess how each approach fares across
the different
concepts. PRIZE
------------
The best submission to the Micropost Concept Extraction Challenge will
receive
an award of (US)$1500, generously sponsored by eBay. Information extraction
challenges associated with treating eBay items, often of short textual
content, are
very similar to those used to treat other short textual microposts. By
teaming up with
eBay to make the challenge possible, the MSM workshop organisers wish to
highlight this
aspect of the micropost extraction research question. The Challenge Committee will judge submissions based on the outcome of
the evaluation
procedure described above, and a review of the extended abstracts, to
obtain insight
into the quality and applicability of the approaches taken. A selection
of the submissions
accepted will be presented at the challenge. All accepted submissions
will be published in
a separate CEUR compendium and made available from the workshop website. SUBMISSIONS
------------
Submissions is as a zip file using your system name as the file name
(e.g. 'awesomeo9000.zip'),
containing:
1. a TSV file with your system name (e.g. 'awesomeo9000.tsv').
2. an extended abstract of 2 pages describing your approach and how you
tuned/tested it using
the training split. Written submissions should be prepared according to the ACM SIG
Proceedings Template
(see http://www.acm.org/sigs/publications/proceedings-templates), and
should include author
names and affiliations, and 3-5 keywords. Submission is via the
EasyChair Conference System,
at: https://www.easychair.org/conferences/?conf=msm2013challenge IMPORTANT DATES
---------------- Challenge Data release: 17 Jan 2013
Intent to submit to challenge: 03 Mar 2013
Challenge Submission deadline: 17 Mar 2013
Challenge Notification: 31 Mar 2013
Challenge camera-ready deadline: 07 Apr 2012 (all deadlines 23:59 Hawaii Time) Workshop program issued: 09 Apr 2013
Challenge proceedings to be published via CEUR
Workshop - 13 May 2013 (Registration open to all) CONTACT
--------------- E-mail: msm2013-0 at easychair.org
Facebook Group: http://www.facebook.com/#!/home.php?sk=group_180472611974910
Facebook Public Event page: http://www.facebook.com/events/116134955169543
Twitter hashtag: #msm2013
W3C Microposts Community Group: http://www.w3.org/community/microposts WORKSHOP ORGANISERS
--------------------- Matthew Rowe, Lancaster University, UK
Milan Stankovic, Université Paris-Sorbonne, France
Aba-Sah Dadzie, The University of Sheffield, UK ------------------
Challenge Chair:
A. Elizabeth Cano, KMi, The Open University, UK Steering Committee & Local Chair:
Bernardo Pereira Nunes, PUC-Rio, Brazil / L3S Research Center, Germany Evaluation Committee:
------------------ Naren Chittar, eBay, USA
Peter Mika, Yahoo! Research, Spain
Andrea Varga, OAK Group, University of Sheffield, UK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130117/56a09754/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list