Arabic-L:LING:Prague Arabic Dependency Treebank 1.0

Dilworth Parkinson dilworth_parkinson at byu.edu
Mon Jan 10 01:37:14 UTC 2005


------------------------------------------------------------------------
-
Arabic-L: Mon 09 Jan 2005
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject: Prague Arabic Dependency Treebank

-------------------------Messages-----------------------------------
1)
Date: 09 Jan 2005
From:smrz at ckl.ms.mff.cuni.cz
Subject: Prague Arabic Dependency Treebank

Dear research colleagues,

let us announce the release of Prague Arabic Dependency Treebank 1.0,
and
apologize if you have already received the LDC Newsletter giving this
kind
of information, too:

-------------------------------------
Prague Arabic Dependency Treebank 1.0
-------------------------------------

LDC Catalog No.: LDC2004T23
ISBN: 1-58563-319-4
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T23>

Prague Arabic Dependency Treebank <http://ckl.mff.cuni.cz/padt/> (PADT)
not only provides multi-level linguistic annotations over the language
of
Modern Standard Arabic, but even offers a variety of unique software
implementations designed for general use in Natural Language Processing.

The corpus of PADT 1.0 consists of morphologically and analytically
annotated newswire texts of Modern Standard Arabic, which originate from
the Arabic Gigaword
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T12>
and the plain data of Penn Arabic Treebank, Part 1
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T06>
and Penn Arabic Treebank, Part 2
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?
catalogId=LDC2004T02>.

The PADT 1.0 distribution comprises over 113 500 tokens of data
annotated
analytically and provided with the disambiguated morphological
information. In addition, the release includes complete annotations of
MorphoTrees resulting in more than 148 000 tokens, 49 000 of which have
received the analytical processing. The contents are further divided
into
data sets as indicated in the documentation
<http://ckl.mff.cuni.cz/padt/>.

Prague Arabic Dependency Treebank 1.0 is distributed on one CD-ROM.

Institutions that have membership in the LDC for the Membership Year
(MY)
2004 will be able to receive this corpus free of charge. Nonmembers may
license this data for US$100.

-------------------------------------

The extensive documentation and most of the tools are however available
on
the project's website <http://ckl.mff.cuni.cz/padt/PADT_1.0/docs/>, and
are controlled by open-source licenses.

Should you find the Prague Arabic Dependency Treebank 1.0 interesting to
use in your research, please fill in the license registration form
<http://ckl.mff.cuni.cz/padt/PADT_1.0_license.html>. Thank you very
much.

We will be pleased to answer your questions and discuss your comments.

Your sincerely,

Authors of the Prague Arabic Dependency Treebank

------------------------------------------------------------------------
--
End of Arabic-L:  09 Jan 2005



More information about the Arabic-l mailing list