Info: ELRA - Language Resources Catalogue - Update

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Sep 29 07:54:35 UTC 2010

Date: Mon, 27 Sep 2010 18:12:43 +0200
From: info at
Message-ID: <4CA0C27B.1090203 at>

Our apologies if you have received multiple copies of this announcement.

ELRA - Language Resources Catalogue - Update

ELRA is happy to announce that 1 new Written Corpus and 2 new Monolingual
Lexicons are now available in its catalogue:

ELRA-W0054 Persian 1984 corpus (Multext-East framework)

This corpus contains the Persian (Farsi) translation of a part of the
novel "1984" (G. Orwell) annotated in the Multext-East framework
(Multilingual Text Tools and Corpora for Eastern and Central European
Languages). The corpus contains approximately 100,000 words (6,604
sentences, 13,247 lemmas), with extensive headers and markup for
document structure, sentences, and various sub-sentence annotations in
the XML-format following the TEI guidelines.
Annotation includes POS (part-of-speech) and lemmas.
For more information, see:

ELRA-L0086 Persian Multext-East framework lexicon

This is a Persian (Farsi) morphosyntactic lexicon derived from the
Persian 1984 corpus (Multext-East framework) (see ELRA-W0054). It
contains the full inflectional paradigms of a superset of lemmas that
appear in the Persian 1984 corpus. Each entry gives the word-form, its
lemma and morphosyntactic description. The lexicon contains 13,247
For more information, see:

ELRA-L0087 Persian lexicon

This is a Persian (Farsi) lexicon of more than 40,000 entries of
non-inflected forms of words. Each word is transliterated based on the
proposed framework from MBROLA (Text-To-Speech synthesizer). The
database includes a large variety of descriptors for each entry
(plural, homograph, ...). The lexicon is provided in a MS Access
For more information, see:

For more information on the catalogue, please contact Valérie Mapelli
mapelli at

Visit our On-line Catalogue:
Visit the Universal Catalogue:
Archives of ELRA Language Resources Catalogue Updates:

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list