Arabic-L:LING:SEMEVAL Task #18: Arabic Semantic Labeling

Mon Mar 26 18:44:39 UTC 2007

------------------------------------------------------------------------
Arabic-L: Mon 26 Mar 2007
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:SEMEVAL Task #18: Arabic Semantic Labeling

-------------------------Messages-----------------------------------
1)
Date: 26 Mar 2007
From:"Mona Diab" <mdiab at cs.columbia.edu>
Subject:SEMEVAL Task #18: Arabic Semantic Labeling

[APOLOGIES FOR DUPLICATES]
The train and test data is now ready for download from the main SEMEVAL
webpage at http://nlp.cs.swarthmore.edu/semeval/

The relevant dates are included on the webpage

Below is a description of the task:

Tasks:

We propose several tasks for Arabic Semantic Labeling.  The tasks  
will span
both the WSD and Semantic Role labeling processes for this  
evaluation. Both
sets of tasks will be evaluated on data derived from the same data  
set, the
test set.

We propose 3 subtasks for WSD all of which will only have test data for
evaluation and trial data for formatting purposes. This will be taken  
from
the Arabic Treebank 3v2 text data, roughly 3000 words long:

1.       The first task is to discover different senses in the data for
nouns and verbs without associating labels with those senses.  
Therefore it
is a sense discrimination task.

In this task the participants will be required to identify that the
different number of senses for nouns and verbs without associating  
labels
with those identified senses. The assumption is that word is one of  
these
senses identified. These senses will be derived from the Arabic WordNet,
which correspond to English WN 2.0. There will be two levels of  
granularity,
coarse and fine grain.

2.       The second task is to annotate all nouns and verbs in the  
data with
Arabic WordNet senses (provided with the test data, and also  
accessible via
the web at http://www.globalwordnet.org/AWN
All verbs and nouns in the data will need to be annotated with their  
sense
indices and/or offsets from Arabic WordNet

3.       The third task is to annotate all nouns and verbs in the  
data with
English wordnet senses
a.       In this task, the participants will be required to link the  
Arabic
nouns and verbs with their corresponding sense(s) in the English  
WordNet 2.0
b.       An English translation corpus will be provided along with the
trial/test data
c.       A bilingual word list will also be provided

  We propose 2 subtasks for Semantic Role Labeling (SRL). These  
subtasks will
have trial, training and test data available for it:

4.       Identifying Arguments in a sentence
In this task, the participants are required to identify all the  
constituents
in a constituency tree that should be annotated with argument roles  
related
to some predetermined verbs

5.       Automatic annotations for all arguments
In this task, the participants are required to identify and label all  
the
constituents in a constituency tree that should be annotated with both
numbered argument roles and ARGM roles related to some predetermined  
verbs

Data

The data will be Arabic Treebank 3 v.2 data which is newswire in Modern
Standard Arabic. The data will be presented in ascii encoding, with the
Buckwalter transliteration scheme. The data will be unvowelised and
tokenized according to the Arabic Treebank clitic tokenization  
scheme. We
will provide code for conversion of encoding from UTF-8 and CP1256 to  
the
Buckwalter transliteration scheme. Moreover, we will provide code for  
the
tokenization, POS tagging and Base Phrase chunking of the Arabic text, a
package can be downloaded from
http://www.cs.columbia.edu/~mdiab/ASVMTools.tar.gz.

We will only opt for 100 most frequent verbs in this set to draw  
training,
trial (for the semantic role labeling tasks) and test data for the  
semantic
role labeling and WSD tasks)

The data is syntactically and morphologically manually annotated. The
syntactic trees are constituency trees.

A preliminary version of the Arabic WordNet will be available

Evaluation metric

SRL: Conlleval metrics of precision recall and f measure

WSD: Scorer 2.0 metrics of precision, recall and f-measure on both  
coarse
and fine grained sense distinctions.

************************************************************************ 
****
************************************************************************ 
****
Mona T. Diab, PhD
Center for Computational Learning Systems
Computational Linguistics Group
Columbia University
Tel.: +1 212 870 1290
Fax: +1 212 870 1285

------------------------------------------------------------------------ 
--
End of Arabic-L:  26 Mar 2007