<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div><div class="moz-forward-container"><pre>apologies for cross-posting

==========================================================================

                          Concept Extraction Challenge @

                the 3rd Workshop on Making Sense of Microposts (#MSM2013)

                                        at WWW 2013

                http://oak.dcs.shef.ac.uk/msm2013/challenge.html

                13th May 2013. Rio de Janeiro, Brazil

===========================================================================

#MSM2013 will host a 'Concept Extraction Challenge', with a prize

sponsored by eBay, where participants must label Microposts in a

given dataset with the concepts referenced. Existing concept extraction

tools are intended for use over news corpora and similar document-based

corpora with relatively long length. The aim of the challenge is to

foster research into novel, more accurate concept extraction for

(much shorter) Micropost data.

The goal of the challenge is to detect concepts contained in Microposts.

Concepts are defined as abstract notions of things; for this challenge we

are constraining the task to the extraction of entity concepts

characterised by an entity type and an entity value. We consider four

entity types defined as follows:

1. Person (PER) - references in the Micropost to a full or partial person

name.

Example:

        Obama responds to diversity criticism

Extracted instances:

        PER/Obama;

2. Location (LOC) - references in the Micropost to full or partial location

names including: cities, provinces or states, countries, continents and

(physical) facilities.

Example:

        Finally on the train to London ahhhh

Extracted instances:

        LOC/London;

3. Organisation (ORG) - references in the Micropost to full or partial

organisation names including academic, state, governmental, military and

business or enterprise organisations.

Example:

NASA's Donated Spy Telescopes May Aid Dark Energy Search

Extracted instances:

        ORG/NASA;

4. Miscellaneous (MISC) - references in the Micropost to a concept not 

covered

by any of the categories above, but limited to one of the entity types:

film/movie, entertainment award event, political event, programming 

language,

sporting event, TV show, nationality, and (spoken or written) language.

Example:

        Okay, now this is getting seriously bizarre. Like a Monty Python script

        gone wrong.

Extracted Instances:

        MISC/Monty Python;

DATASET

-----

Two datasets covering a variety of topics of discussion have been provided:

one for training and one for testing. The complete dataset (both 

training and

testing data) contains 4265 manually annotated microposts using the above

definitions. The dataset is split by 60%/40% for training and testing.

Training Dataset

-----

A tab-separated data with the following element indices per micropost:

- Element 1: The numeric ID of the micropost

- Element 2: The concepts found within the micropost, described by an 

entity

type and an entity instance. These are semi-colon separated values

(e.g. PER/Obama;ORG/NASA).

- Element 3: The content of the micropost - this is what the concepts were

detected and extracted from.

Test Dataset

-----

Also tab-separated data, but unlike the training dataset the concepts have

not been extracted:

-Element 1: The numeric ID of the micropost

-Element 2: The content of the micropost, this is what you must use to 

detect

and extract the concepts contained.

Anonymisation and Special Terms

-----

To ensure anonymity all username mentions in the microposts have been 

replaced

with '_Mention_', and all URLs with '_URL_'.

Data Access

-----

The datasets can be downloaded from: 

http://oak.dcs.shef.ac.uk/msm2013/ie_challenge

EVALUATION

------------

In order to evaluate your submissions we require you to submit (along 

with a

paper describing your approach) a tab-separated value (TSV) file with the

following format for the microposts in the test dataset:

-Element 1: The numeric ID of the micropost.

-Element 2: The entity type and entity instance detected in each 

micropost. These

are semi-colon separated values (e.g. PER/Obama;ORG/NASA).

For instance, your results would be formatted as:

2560     PER/Obama;ORG/NASA

2561

2562     ORG/FDA;

…

This file will be parsed and the accuracy of each approach computed. 

Accuracy

will be judged using the f-measure (with beta = 1 so precision and 

recall are weighted

equally). This will be computed on a per entity-type/entity-instance 

pair basis and

then averaged across the four entity types. We will also provide 

entity-type specific

f-measure values for each team to assess how each approach fares across 

the different

concepts.

PRIZE

------------

The best submission to the Micropost Concept Extraction Challenge will 

receive

an award of (US)$1500, generously sponsored by eBay. Information extraction

challenges associated with treating eBay items, often of short textual 

content, are

very similar to those used to treat other short textual microposts. By 

teaming up with

eBay to make the challenge possible, the MSM workshop organisers wish to 

highlight this

aspect of the micropost extraction research question.

The Challenge Committee will judge submissions based on the outcome of 

the evaluation

procedure described above, and a review of the extended abstracts, to 

obtain insight

into the quality and applicability of the approaches taken. A selection 

of the submissions

accepted will be presented at the challenge. All accepted submissions 

will be published in

a separate CEUR compendium and made available from the workshop website.

SUBMISSIONS

------------

Submissions is as a zip file using your system name as the file name 

(e.g. 'awesomeo9000.zip'),

containing:

1. a TSV file with your system name (e.g. 'awesomeo9000.tsv').

2. an extended abstract of 2 pages describing your approach and how you 

tuned/tested it using

the training split.

Written submissions should be prepared according to the ACM SIG 

Proceedings Template

(see http://www.acm.org/sigs/publications/proceedings-templates), and 

should include author

names and affiliations, and 3-5 keywords. Submission is via the 

EasyChair Conference System,

at:  https://www.easychair.org/conferences/?conf=msm2013challenge

IMPORTANT DATES

----------------

Challenge Data release: 17 Jan 2013

Intent to submit to challenge: 03 Mar 2013

Challenge Submission deadline: 17 Mar 2013

Challenge Notification: 31 Mar 2013

Challenge camera-ready deadline: 07 Apr 2012

(all deadlines 23:59 Hawaii Time)

Workshop program issued: 09 Apr 2013

Challenge proceedings to be published via CEUR

Workshop - 13 May 2013 (Registration open to all)

CONTACT

---------------

E-mail: msm2013-0@easychair.org

Facebook Group: http://www.facebook.com/#!/home.php?sk=group_180472611974910

Facebook Public Event page: http://www.facebook.com/events/116134955169543

Twitter hashtag: #msm2013

W3C Microposts Community Group: http://www.w3.org/community/microposts

WORKSHOP ORGANISERS

---------------------

Matthew Rowe, Lancaster University, UK

Milan Stankovic, Université Paris-Sorbonne, France

Aba-Sah Dadzie, The University of Sheffield, UK

------------------

Challenge Chair:

A. Elizabeth Cano, KMi, The Open University, UK

Steering Committee & Local Chair:

Bernardo Pereira Nunes, PUC-Rio, Brazil / L3S Research Center, Germany

Evaluation  Committee:

------------------

Naren Chittar, eBay, USA

Peter Mika, Yahoo! Research, Spain

Andrea Varga, OAK Group, University of Sheffield, UK

</pre>

  <br>

</div></div></div></body></html>