Arabic-L:LING:Quranic Arabic Corpus-Version 0.1 Released
Dilworth Parkinson
dil at BYU.EDU
Thu Nov 12 20:39:54 UTC 2009
------------------------------------------------------------------------
Arabic-L: Thu 12 Nov 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:Quranic Arabic Corpus-Version 0.1 Released
-------------------------Messages-----------------------------------
1)
Date: 12 Nov 2009
From:Kais Dukes <dukes.kais at googlemail.com>
Subject:Quranic Arabic Corpus-Version 0.1 Released
Hello All,
For those interested in Arabic part-of-speech tagging and syntactic
analysis, a new resource has now be made available as a free open
source download:
http://quran.uk.net
You can now obtain version 0.1 of the data which includes:
(1) A plain text file showing each word in every verse of the Quran,
together with its (contextual) part-of-speech tag.
(2) The same data in XML format encoded as UTF-8
(3) A more detailed XML file with full morphological (inflection
+derivation) feature tags
We plan to produce incremental updates until we reach version 1.0 -
cross-annotator verification for full morphology and syntax of the
Quran using dependency grammar. The Quranic Arabic Corpus is an
annotated linguistic resource consisting of 77,430 words of Quranic
Arabic. The research project is led by Kais Dukes at the University of
Leeds, and is part of the Arabic language computing research group
within the School of Computing, supervised by Eric Atwell. The project
aims to provide a richly annotated linguistic resource for researchers
wanting to study the Arabic language of the Quran. The grammatical
analysis helps readers further in uncovering the detailed intended
meanings of each verse and sentence. Each word of the Quran is tagged
with its part-of-speech as well as multiple morphological features.
Unlike other annotated Arabic corpora, the grammar framework adopted
by the Quranic Corpus is the traditional Arabic grammar of i'rab.
The research project includes:
- A manually verified part-of-speech tagged Quranic Arabic corpus.
- An annotated treebank of Quranic Arabic.
- A novel visualization of traditional Arabic grammar through
dependency graphs.
- Morphological search for the Quran.
- A machine-readable morphological lexicon of Quranic words into
English.
- A part-of-speech concordance for Quranic Arabic organized by lemma.
- An online message board for community volunteer annotation.
The annotation for each of the 77,430 words in the Quran has been
reviewed in stages by two annotators, and improvements are still
ongoing to further improve accuracy.
Any feedback on the project is most welcome.
Kind Regards,
-- Kais Dukes
School of Computing
University of Leeds
web: http://quran.uk.net
e-mail: sckd at leeds.ac.uk
--------------------------------------------------------------------------
End of Arabic-L: 12 Nov 2009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20091112/86248b9e/attachment.htm>
More information about the Arabic-l
mailing list