Arabic-L:LING:New Quran Arabic Treebank and Call for Volunteer Annotators
Dilworth Parkinson
dil at BYU.EDU
Wed Oct 21 23:32:47 UTC 2009
------------------------------------------------------------------------
Arabic-L: Wed 21 Oct 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:New Quran Arabic Treebank and Call for Volunteer Annotators
-------------------------Messages-----------------------------------
1)
Date: 21 Oct 2009
From:dukes.kais at googlemail.com
Subject:New Quran Arabic Treebank and Call for Volunteer Annotators
Hello All,
A new version of the Crescent Quran Corpus is now freely available
online at http://quran.uk.net. The corpus contains both morphological
and syntactic annotation of the Quran in Arabic. Previous releases of
the corpus focused on the morphology of Classical Arabic, but this new
release now includes an in-progress syntactic treebank of the Quran.
Some new features of this release of the corpus include:
(1) Natural Language Generation (NLG) has been applied to provide
summaries in English of the morphology of each Arabic word of the
Quran. For example:
The fourth word of verse (21:70) is divided into 4 morphological
segments. A conjunction, verb, subject pronoun and object pronoun. The
prefixed conjunction fa is usually translated as "then" or "so". The
perfect verb (fi3il mad) is first person masculine plural. The verb's
root is jim 3ayn lam (j 3 l). The attached object pronoun is third
person masculine plural.
See http://quran.uk.net/TokenDetail.aspx?location=(21:70:4)
(2) Syntactic Treebank. Syntactic annotation of the Quran has been
expanded, using a hybrid dependency / constituency framework,
following traditional Arabic grammar (i'3raab). Syntactic annotation
is now available for chapters 67 to 114. See http://quran.uk.net/Treebank.aspx
. Morphological annotation for all of the Quran with part-of-speech
tagging has been reviewed and improved.
(3) Quran Java API. A Quran Java API for the text of the corpus has
been integrated into the website, and is freely available for download.
(4) Grammar Documentation and Annotation Guidelines. The website now
includes a comprehensive set of documentation on Arabic dependency
grammar which also serves as set of guidelines for corpus annotators.
(5) Audio Improvements. A selection of 10 choices for audio, including
an audio English translation of the text for each verse in the corpus.
(6) Arabic/English Lexicon of the Quran. Now includes root counts for
each lexicon entry.
(7) Improved Visualization. The website provides improved
visualization for 700 dependency graphs, with better website layout
and navigation.
----------------------------------------------------------------------
Interested in becoming a volunteer annotator?
We are currently looking for native Arabic speakers to assist in
corpus annotation, and in particular syntactic annotation. The
Crescent corpus is an open source community project with the aim of
producing accurate multi-level annotation of the Quran in classical
Arabic, including morphological and syntactic annotation. The
framework adpoted for syntactic annotation is that of traditional
Arabic dependency grammar (i'3raab).
For more information on the corpus please contact the main project
researcher.
Kais Dukes,
School of Computing
University of Leeds
United Kingdom
--------------------------------------------------------------------------
End of Arabic-L: 21 Oct 2009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20091021/43902a46/attachment.htm>
More information about the Arabic-l
mailing list