Arabic-L:LING:New Quran Arabic Treebank and Call for Volunteer Annotators

Dilworth Parkinson dil at BYU.EDU
Wed Oct 21 23:32:47 UTC 2009


------------------------------------------------------------------------
Arabic-L: Wed 21 Oct 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
             unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:New Quran Arabic Treebank and Call for Volunteer Annotators

-------------------------Messages-----------------------------------
1)
Date: 21 Oct 2009
From:dukes.kais at googlemail.com
Subject:New Quran Arabic Treebank and Call for Volunteer Annotators

Hello All,

A new version of the Crescent Quran Corpus is now freely available  
online at http://quran.uk.net. The corpus contains both morphological  
and syntactic annotation of the Quran in Arabic. Previous releases of  
the corpus focused on the morphology of Classical Arabic, but this new  
release now includes an in-progress syntactic treebank of the Quran.  
Some new features of this release of the corpus include:

(1) Natural Language Generation (NLG) has been applied to provide  
summaries in English of the morphology of each Arabic word of the  
Quran. For example:

The fourth word of verse (21:70) is divided into 4 morphological  
segments. A conjunction, verb, subject pronoun and object pronoun. The  
prefixed conjunction fa is usually translated as "then" or "so". The  
perfect verb (fi3il mad) is first person masculine plural. The verb's  
root is jim 3ayn lam (j 3 l). The attached object pronoun is third  
person masculine plural.
See http://quran.uk.net/TokenDetail.aspx?location=(21:70:4)

(2) Syntactic Treebank. Syntactic annotation of the Quran has been  
expanded, using a hybrid dependency / constituency framework,  
following traditional Arabic grammar (i'3raab). Syntactic annotation  
is now available for chapters 67 to 114. See http://quran.uk.net/Treebank.aspx 
. Morphological annotation for all of the Quran with part-of-speech  
tagging has been reviewed and improved.

(3) Quran Java API. A Quran Java API for the text of the corpus has  
been integrated into the website, and is freely available for download.

(4) Grammar Documentation and Annotation Guidelines. The website now  
includes a comprehensive set of documentation on Arabic dependency  
grammar which also serves as set of guidelines for corpus annotators.

(5) Audio Improvements. A selection of 10 choices for audio, including  
an audio English translation of the text for each verse in the corpus.

(6) Arabic/English Lexicon of the Quran. Now includes root counts for  
each lexicon entry.

(7) Improved Visualization. The website provides improved  
visualization for 700 dependency graphs, with better website layout  
and navigation.

----------------------------------------------------------------------

Interested in becoming a volunteer annotator?

We are currently looking for native Arabic speakers to assist in  
corpus annotation, and in particular syntactic annotation. The  
Crescent corpus is an open source community project with the aim of  
producing accurate multi-level annotation of the Quran in classical  
Arabic, including morphological and syntactic annotation. The  
framework adpoted for syntactic annotation is that of traditional  
Arabic dependency grammar (i'3raab).

For more information on the corpus please contact the main project  
researcher.

Kais Dukes,

School of Computing
University of Leeds
United Kingdom


--------------------------------------------------------------------------
End of Arabic-L:  21 Oct 2009


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20091021/43902a46/attachment.htm>


More information about the Arabic-l mailing list