[Corpora-List] 1st CFP: Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools (LTEC2014)

AbdulMohsen Al-Thubaity PhD, PMP althubaity at gmail.com
Tue Dec 17 09:59:59 UTC 2013


 

Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools

 

Workshop URL:  <http://www.kacstac.org.sa/osact/index.html>
http://www.kacstac.org.sa/osact/index.html 

 

 

Workshop description 

 

For Natural Language Processing (NLP) and Computational Linguistics (CL)
communities, it was a known situation that Arabic is a resource poor
language. This situation was thought to be the reason why there is a lack of
corpus based studies in Arabic. However, the last years witnessed the
emergence of new considerably free Arabic corpora and in lesser extent
Arabic corpora processing tools. 

 

Freely available Arabic corpora can be divided into two groups. The first
group contains large Arabic corpora, which are designed and constructed
basically for Arabic linguistics research and activities, and maybe for
Arabic NLP. These corpora are diverse in the genres they cover and their
sizes range from one million words to 700 million words. The second group
contains corpora that were designed basically for Arabic text classification
and clustering, they mainly contain newspapers' articles. They range from
less than 1 million words to 11 million words. 

 

Some Arabic corpora are available on the web to explore using different
tools, basically large corpora, while other corpora are only available for
download. For the corpora that are available for download, the user may need
to use standalone corpus processing tools. These tools contain many
functionality such as word frequency, concordance, collocation, etc.
Therefore, with the availability of large and diverse Arabic corpora, the
situation does not change. There is still a lack of Arabic corpus base
studies. Is this because of representativeness of these corpora? The
available functions and tools associated with these corpora? or is it
because they are not well known enough for the Arabic linguistics community?


 

 

Motivation and topics of interest

 

This half-day-workshop aims to encourage the researchers and developers to
foster the utilization of freely available Arabic corpora and open source
Arabic corpora processing tools and help in highlighting the drawbacks of
these resources and discuss techniques and approaches on how to improve
them. The workshop topics include but not limited to:

1.      Surveying and criticizing the design of freely available Arabic
corpora, their associated tools and stand alone Arabic corpora processing
tools.

2.      The applications and uses of freely available Arabic language
resources in fields such as Arabic language education e.g. L1 and L2.

3.      Arabic language modeling.

4.      Corpus based Arabic lexigraphy.

5.	Lexical semantics and word sense.

6.      Corpus based Arabic syntactic.

7.      Corpus based Arabic morphology.

8.      Development of Arabic mobile applications based on the available
Arabic language resources.

9.      Evaluation and assessment of Arabic Corpora and Corpora Processing
Tools.

10.   Future directions of Free/Open Arabic Corpora and Corpora Processing
Tools.

 

 

Important Dates

 

Submission deadline: 10 February 2014

Notification of acceptance: 10 March 2013

Final submission of manuscripts: 21 March 2014

Workshop date: 27 May 2014 (morning session)  

 

Submission guidelines

The language of the workshop is English and submissions should be with
respect to LREC 2014 paper submission instructions. All papers will be peer
reviewed possibly by three independent referees.  Papers must be submitted
electronically in PDF format to the STAR system. When submitting a paper
from the START page, authors will be asked to provide essential information
about resources (in a broad sense, i.e. also technologies, standards,
evaluation kits, etc.) that have been used for the work described in the
paper or are a new result of your research. Moreover, ELRA encourages all
LREC authors to share the described LRs (data, tools, services, etc.), to
enable their reuse, replicability of experiments, including evaluation ones,
etc.


Organising Committee

 

Hend Al-Khalifa, King Saud University, KSA

Abdulmohsen Al-Thubaity, King Abdul Aziz City for Science and Technology,
KSA

 

Program Committee

 

Eric Atwell, University of Leeds, UK 

Khaled Shaalan, The British University in Dubai (BUiD), UAE 

Dilworth Parkinson, Brigham Young University, USA

Nizar Habash, Columbia University, USA 

Khurshid Ahmad, Trinity College Dublin, Ireland

Abdulmalik AlSalman, King Saud University, KSA  

Maha Alrabiah, King Saud University, KSA

Saleh Alosaimi, Imam University, KSA

Sultan almujaiwel, King Saud University, KSA

Adam Kilgarriff, Lexical Computing Ltd, UK 

Amal AlSaif, Imam University, KSA

Maha AlYahya, King Saud University, KSA

Auhood AlFaries, King Saud University, KSA

Salwa Hamada, Taibah University, KSA

Mansour Algamdi, King Abdul Aziz City for Science and Technology, KSA

Abdullah Alfaifi, University of Leeds, UK

 

 

 



---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131217/8db45464/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list