<HTML>

<HEAD>

<TITLE>Call for participation: SEMEVAL Task #18: Arabic Semantic labeling</TITLE>

</HEAD>

<BODY>

<FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

<BR>

<BR>

</SPAN></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>[APOLOGIES FOR MULTIPLE POSTINGS]<BR>

 <BR>

The train and test data is now ready for download from the main SEMEVAL webpage at <a href="http://nlp.cs.swarthmore.edu/semeval/">http://nlp.cs.swarthmore.edu/semeval/</a><BR>

 <BR>

The relevant dates are included on the webpage<BR>

 <BR>

Below is a description of the task:<BR>

 <BR>

<FONT COLOR="#000080"><B><U>Tasks: <BR>

</U></B> <BR>

We propose several tasks for Arabic Semantic Labeling.  The tasks will span both the WSD and Semantic Role labeling processes for this evaluation. Both sets of tasks will be evaluated on data derived from the same data set, the test set. <BR>

 <BR>

We propose 3 subtasks for WSD all of which will only have test data for evaluation and trial data for formatting purposes. This will be taken from the Arabic Treebank 3v2 text data, roughly 3000 words long:<BR>

           <BR>

</FONT>1.</SPAN></FONT></FONT><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>The first task is to discover different senses in the data for nouns and verbs without associating labels with those senses. Therefore it is a sense discrimination task.<BR>

<FONT COLOR="#000080">In this task the participants will be required to identify that the different number of senses for nouns and verbs without associating labels with those identified senses. The assumption is that word is one of these senses identified. These senses will be derived from the Arabic WordNet, which correspond to English WN 2.0. There will be two levels of granularity, coarse and fine grain. <BR>

           <BR>

</FONT>2.</SPAN></FONT></FONT><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>The second task is to annotate all nouns and verbs in the data with Arabic WordNet senses <FONT COLOR="#000080">(provided with the test data, and also accessible via the web at http://</FONT></SPAN></FONT><SPAN STYLE='font-size:13.0px'><FONT FACE="Verdana, Helvetica, Arial">www.globalwordnet.org/AWN<BR>

</FONT><FONT COLOR="#000080"><FONT FACE="Arial">All verbs and nouns in the data will need to be annotated with their sense indices and/or offsets from Arabic WordNet<BR>

 <BR>

3.</FONT></FONT></SPAN></FONT><FONT COLOR="#000080"><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>The third task is to annotate all nouns and verbs in the data with English wordnet senses<BR>

a.</SPAN></FONT></FONT><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT><FONT COLOR="#000080"><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>In this task, the participants will be required to link the Arabic nouns and verbs with their corresponding sense(s) in the English WordNet<B><I> </I></B>2.0<BR>

</SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>b.</SPAN></FONT></FONT><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT><FONT COLOR="#000080"><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>An English translation corpus will be provided along with the trial/test data<BR>

</SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>c.</SPAN></FONT></FONT><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT><FONT COLOR="#000080"><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>A bilingual word list will also be provided </SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'> <BR>

 <BR>

<FONT COLOR="#000080"> We propose 2 subtasks for Semantic Role Labeling (SRL). These subtasks will have trial, training and test data available for it:<BR>

 <BR>

4.</FONT></SPAN></FONT></FONT><FONT COLOR="#000080"><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>Identifying Arguments in a sentence<BR>

<FONT COLOR="#000080">In this task, the participants are required to identify all the constituents in a constituency tree that should be annotated with argument roles related to some predetermined verbs <BR>

 <BR>

 <BR>

5.</FONT></SPAN></FONT></FONT><FONT COLOR="#000080"><FONT SIZE="2"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:10.0px'>       </SPAN></FONT></FONT></FONT><FONT SIZE="4"><FONT FACE="Arial"><SPAN STYLE='font-size:13.0px'>Automatic annotations for all arguments<BR>

<FONT COLOR="#000080">In this task, the participants are required to identify and label all the constituents in a constituency tree that should be annotated with both numbered argument roles and ARGM roles related to some predetermined verbs <BR>

 <BR>

</FONT><B><U>Data<BR>

</U></B> <BR>

The data will be Arabic Treebank 3 <FONT COLOR="#000080">v.2 </FONT>data which is newswire<FONT COLOR="#000080"> in Modern Standard Arabic. The data will be presented in ascii encoding, with the Buckwalter transliteration scheme. The data will be unvowelised and tokenized according to the Arabic Treebank clitic tokenization scheme. We will provide code for conversion of encoding from UTF-8 and CP1256 to the Buckwalter transliteration scheme. Moreover, we will provide code for the tokenization, POS tagging and Base Phrase chunking of the Arabic text, a package can be downloaded from <a href="http://www.cs.columbia.edu/~mdiab/ASVMTools.tar.gz.">http://www.cs.columbia.edu/~mdiab/ASVMTools.tar.gz.</a><BR>

</FONT> <BR>

We will only opt for 100 most frequent verbs in this set<FONT COLOR="#000080"> to draw training, trial (for the semantic role labeling tasks) and test data for the semantic role labeling and WSD tasks)<BR>

The data is syntactically and morphologically manually annotated.</FONT> <FONT COLOR="#000080">The syntactic trees are constituency trees.<BR>

A preliminary version of the Arabic WordNet will be available <BR>

</FONT> <BR>

<B><U>Evaluation metric<BR>

</U></B> <BR>

<FONT COLOR="#000080">SRL: </FONT>Conlleval metrics of precision recall and f measure<BR>

<FONT COLOR="#000080">WSD: Scorer<B><I> </I></B>2.0 metrics of precision, recall and f-measure on both coarse and fine grained sense distinctions.<BR>

</FONT>************************************************************************************************************************************************************** <BR>

 <BR>

</SPAN></FONT></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Mona T. Diab, PhD<BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Center for Computational Learning Systems<BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Computational Linguistics Group<BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Columbia</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'> University<BR>

<BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'> <BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Tel.: +1 212 870 1290<BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'>Fax: +1 212 870 1285<BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

</SPAN></FONT><FONT SIZE="5"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:16.0px'> <BR>

</SPAN></FONT></FONT><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>

<BR>

------ End of Forwarded Message<BR>

<BR>

<BR>

------ End of Forwarded Message<BR>

</SPAN></FONT>

</BODY>

</HTML>