[Corpora-List] Call for Participation: Coling Workshop on Arabic Script Languages

Karine Megerdoomian karinem at inxight.com
Wed Jul 14 17:24:15 UTC 2004


		    ** Call for Participation **

		COLING 2004 WORKSHOP ON
COMPUTATIONAL APPROACHES TO ARABIC SCRIPT-BASED LANGUAGES

	         Geneva, Switzerland, 23-27 August 2004
	     Invited Speaker: Martin Kay (Stanford University)
	      http://members.cox.net/karinem/COLING2004



WORKSHOP THEME 

Recently, there has been a surge of interest in the study of the languages of the Middle East, especially Arabic, Persian (Farsi), Pashto, Kurdish and Urdu. The usage of the Arabic script gives rise to certain issues that are common to all these languages despite their being of distinct language families. Hence, these languages share properties such as the absence of capitalization, right to left direction, lack of clear word boundaries, complex word structure, a high degree of ambiguity due to non-representation of short vowels in the writing system, and related encoding issues. Yet the research on these various languages have rarely been brought together in a single forum, and most development has been the result of initiatives by individual research establishments or industry firms. 

The goal of this workshop is to provide a forum for those involved in the development of NLP systems in Arabic script languages to exchange ideas, approaches and implementations of computational systems; to discuss the common challenges faced by all practitioners; and to assess the state of the art in the field. In addition, one of the aims of the workshop is to identify promising areas for future collaborative research in the development of NLP systems for Arabic script languages. 


WORKSHOP PROGRAM 

I. Opening and Overview
8:30-9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Directions - Ali Farghaly 

II. Session 1: Lexicon and Corpora
9:00-9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools - Mohamed Maamouri and Ann Bies 
9:30-10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource Construction - Anne R. Diekema 
10:00-10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, Ratios, and Categories from the DIINAR.1 Source Program - Ramzi Abbès, Joseph Dichy and Mohamed Hassoun 

10:30-10:45 Break

III. Session 2: Morphology
10:45-11:15 Systematic Verb Stem Generation for Arabic - Jim Yaghi and Sane Yagi 
11:15-11:45 Issues in Arabic Orthography and Morphology Analysis - Tim Buckwalter 
11:45-12:15 Finite-State Morphological Analysis of Persian - Karine Megerdoomian 

12:15-2:00 Lunch & Demo Sessions

IV. Demonstrations 
Urdu Localization Project - Sarmad Hussain 
FarsiSum: A Persian Text Summarizer - Martin Hassel and Nima Mazdak 
Stemming the Qur'an - Naglaa Thabet 
Language Weaver Arabic->English MT - Daniel Marcu, Alex Fraser, William Wong and Kevin Knight 

V. Invited Speaker  
2:00-2:45 Arabic Script-Based Languages Deserve to be Studied Linguistically - Martin Kay 

VI. Session 3: Statistical Approaches
2:45-3:15 An Unsupervised Approach for Bootstrapping Arabic Sense Tagging - Mona T. Diab 
3:15-3:45 Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm - Mohamed El Kourdi, Amine Bensaid and Tajje-eddine Rachidi 

3:45-4:00 Break

VII. Session 4: Speech Processing 
4:00-4:30 A Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Applications - Shadi Ganjavi, Panayiotis G. Georgiou and Shrikanth Narayanan 
4:30-5:00 Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition - Dimitra Vergyri and Katrin Kirchhoff 
5:00-5:30 Letter-to-Sound Conversion for Urdu Text-to-Speech System - Sarmad Hussain 

VIII. Discussion and Closing
5:30-6:00 Ali Farghaly and Karine Megerdoomian 

Accepted papers and formal demonstrations will be published in a proceedings volume, which will be made available at the workshop. 



WORKSHOP REGISTRATION 

For the workshops to take place, the COLING 2004 organizers require at least 20 participants to register for the workshop. Speakers and participants are therefore asked to register via the official Coling 2004 website as soon as possible by visiting http://www.issco.unige.ch/coling2004/. 

Workshop fees (in Swiss Francs): 
* Student early chf 90 
* Student late chf 120 
* Student on-site chf 150 
* Regular early chf 120 
* Regular late chf 150 
* Regular on-site chf 180 


ORGANIZING COMMITTEE 
 
Ali Farghaly (SYSTRAN Software, Inc.) 
Karine Megerdoomian (Inxight Software and University of California, San Diego) 


PROGRAM COMMITTEE 

Jan W. Amtrup (Bowne Global Solutions) 
Tim Buckwalter (Linguistic Data Consortium) 
Miriam Butt (Konstanz University, Germany) 
Violetta Cavalli-Sforza (Carnegie Mellon University) 
Joseph Dichy (Lyon University) 
Abdelkadir Fassi Fehri (Mohammed V University-Souissi Rabat, Morocco) 
Andrew Freeman (University of Washington) 
Nizar Habash (University of Maryland, College Park) 
Masayo Iida (Inxight Software, Inc) 
Simin Karimi (University of Arizona) 
Martin Kay (Stanford University) 
Kevin Knight (USC/Information Sciences Institute) 
Farhad Oroumchian (University of Wollongong in Dubai) 
Ahmed Rafea (The American University in Cairo) 
Jean Senellart (SYSTRAN Software) 
Bonnie Glover Stalls (University of Southern California) 
Rémi Zajac (SYSTRAN Software) 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20040714/63f90d24/attachment.htm>


More information about the Corpora mailing list