[Corpora-List] Classical Arabic corpora

Eric Atwell csc6ea at leeds.ac.uk
Fri Feb 3 17:45:56 UTC 2012


Dear Mai,

I guess you already know about the Quranic Arabic Corpus
http://corpus.quran.com/  - annotated with morphological tagging,
pronoun reference resolution, parallel English translation, 
partial parsing etc;  BUT this covers only the Quran,
c70K words (depending on how you tokenise Arabic into words)

Claudio Soria recommended the LRE Map
http://www.resourcebook.eu/LreMap/faces/views/resourceMap.xhtml
BUT the Arabic corpora there are all Modern Standard Arabic, 
except for Qurany: A Tool to Search for Concepts in the Quran
http://quranytopics.appspot.com/
  ... and this is also limited to the Quran

Gregory Crane's Perseus digital library of classical text is mainly
Classical Greek and Latin, but there is a section labelled "Arabic"
  - BUT currently this contains the Quran, plus dictionaries

The Perseus website does have a pointer to another source:
"Perseus also wants to highlight the release on Alpheios.net of key 
texts in Classical Arabic, including Book of Songs, Arabian Nights, 
Arabic Reading Lessons, The Autobiography Of The Constantinopolitan
Story-Teller, Selection from the Annals of Tabari, Selections from
Arabic geographical literature and Voyages D'Ibn Batutah ..."
BUT while the Alpheios.net enables online reading, I am not clear
whether you can download a whole book as a corpus textfile.

At the NITS'2011 National Information Technology Symposium on 
"Arabic and Islamic Content on the Internet" at King Saud
University, Riyadh, Mansour Alhamdi outlined the KACST initiative
to collect a large Arabic corpus including Classical and Modern Arabic
http://nits2011.ksu.edu.sa/en/cap/CD/Keynote%20Speakers/Mansour%20Alghamdi.pdf
BUT I have not heard how far this has succeeded yet.

I believe the Kuwait government ministry of religious studies has plans 
to put online its collection of Classical Arabic texts; 
but again I have no news of progress on this.


If you get any better answers, do please let me know

Eric Atwell, Leeds University




On Fri, 3 Feb 2012, Mai Zaki wrote:

> Dear all,
> 
> I was wondering if you could advise me if there are any available corpora
> for Classical Written Arabic in any genre. I'm looking for a corpus of
> Written Arabic of any age between the Classical Arabic of the Qur'an and
> Modern Standard Arabic.
> 
> Thanks a lot in advance,
> 
> Mai Zaki
> 
>

-- 
Eric Atwell, Senior Lecturer, Language Processing research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/eric
       http://www.comp.leeds.ac.uk/nlp

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list