Arabic-L:LING:SEMEVAL Task #18: Arabic Semantic Labeling
Dilworth Parkinson
dilworth_parkinson at BYU.EDU
Mon Mar 26 18:44:39 UTC 2007
------------------------------------------------------------------------
Arabic-L: Mon 26 Mar 2007
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:SEMEVAL Task #18: Arabic Semantic Labeling
-------------------------Messages-----------------------------------
1)
Date: 26 Mar 2007
From:"Mona Diab" <mdiab at cs.columbia.edu>
Subject:SEMEVAL Task #18: Arabic Semantic Labeling
[APOLOGIES FOR DUPLICATES]
The train and test data is now ready for download from the main SEMEVAL
webpage at http://nlp.cs.swarthmore.edu/semeval/
The relevant dates are included on the webpage
Below is a description of the task:
Tasks:
We propose several tasks for Arabic Semantic Labeling. The tasks
will span
both the WSD and Semantic Role labeling processes for this
evaluation. Both
sets of tasks will be evaluated on data derived from the same data
set, the
test set.
We propose 3 subtasks for WSD all of which will only have test data for
evaluation and trial data for formatting purposes. This will be taken
from
the Arabic Treebank 3v2 text data, roughly 3000 words long:
1. The first task is to discover different senses in the data for
nouns and verbs without associating labels with those senses.
Therefore it
is a sense discrimination task.
In this task the participants will be required to identify that the
different number of senses for nouns and verbs without associating
labels
with those identified senses. The assumption is that word is one of
these
senses identified. These senses will be derived from the Arabic WordNet,
which correspond to English WN 2.0. There will be two levels of
granularity,
coarse and fine grain.
2. The second task is to annotate all nouns and verbs in the
data with
Arabic WordNet senses (provided with the test data, and also
accessible via
the web at http://www.globalwordnet.org/AWN
All verbs and nouns in the data will need to be annotated with their
sense
indices and/or offsets from Arabic WordNet
3. The third task is to annotate all nouns and verbs in the
data with
English wordnet senses
a. In this task, the participants will be required to link the
Arabic
nouns and verbs with their corresponding sense(s) in the English
WordNet 2.0
b. An English translation corpus will be provided along with the
trial/test data
c. A bilingual word list will also be provided
We propose 2 subtasks for Semantic Role Labeling (SRL). These
subtasks will
have trial, training and test data available for it:
4. Identifying Arguments in a sentence
In this task, the participants are required to identify all the
constituents
in a constituency tree that should be annotated with argument roles
related
to some predetermined verbs
5. Automatic annotations for all arguments
In this task, the participants are required to identify and label all
the
constituents in a constituency tree that should be annotated with both
numbered argument roles and ARGM roles related to some predetermined
verbs
Data
The data will be Arabic Treebank 3 v.2 data which is newswire in Modern
Standard Arabic. The data will be presented in ascii encoding, with the
Buckwalter transliteration scheme. The data will be unvowelised and
tokenized according to the Arabic Treebank clitic tokenization
scheme. We
will provide code for conversion of encoding from UTF-8 and CP1256 to
the
Buckwalter transliteration scheme. Moreover, we will provide code for
the
tokenization, POS tagging and Base Phrase chunking of the Arabic text, a
package can be downloaded from
http://www.cs.columbia.edu/~mdiab/ASVMTools.tar.gz.
We will only opt for 100 most frequent verbs in this set to draw
training,
trial (for the semantic role labeling tasks) and test data for the
semantic
role labeling and WSD tasks)
The data is syntactically and morphologically manually annotated. The
syntactic trees are constituency trees.
A preliminary version of the Arabic WordNet will be available
Evaluation metric
SRL: Conlleval metrics of precision recall and f measure
WSD: Scorer 2.0 metrics of precision, recall and f-measure on both
coarse
and fine grained sense distinctions.
************************************************************************
****
************************************************************************
****
Mona T. Diab, PhD
Center for Computational Learning Systems
Computational Linguistics Group
Columbia University
Tel.: +1 212 870 1290
Fax: +1 212 870 1285
------------------------------------------------------------------------
--
End of Arabic-L: 26 Mar 2007
More information about the Arabic-l
mailing list