Ressources: ELRA - Language Resources Catalogue - Update

Thierry Hamon hamon at LIMSI.FR
Wed Jan 15 18:13:52 UTC 2014

Date: Wed, 15 Jan 2014 16:17:30 +0100 (CET)
From: info at
Message-ID: <643626874.750972.1389799050542.JavaMail.zimbra at>

[Apologies for cross-postings]

We are happy to announce that 2 new Written Corpora are now available in
our catalogue. 

Those corpora are part of the Nepali National Corpus, which was produced
in 2006 in the framework of the project Bhasha Sanchar (“language
communication”), also known as Nelralec, for Nepali Language Resources
and Localization for Education and Communication; funded by the EU Asia
IT&C programme, reference number ASIE/2004/091-777.

ELRA-W0076 Nepali Monolingual written corpus
The Nepali Monolingual written corpus comprises the core corpus (core
sample) and the general corpus. The core sample (CS) represents the
collection of Nepali written texts from 15 different genres with 2000
words each published between 1990 and 1992. It is based on FLOB/FROWN
corpora and contains 802,000 words. The general corpus (GC) consists of
written texts collected opportunistically from a wide range of sources
such as the internet webs, newspapers, books, publishers and authors. It
contains 1,400,000 words.
For more information, see:

ELRA-W0077 English-Nepali Parallel Corpus
This corpus consists of a collection of national development texts in
English and Nepali. A small set of data is aligned at the sentence level
(27,060 English words; 21,756 Nepali words), and a larger set of texts
at the document level (617,340 English words; 596,571 Nepali words). An
additional set of monolingual data in Nepali is also provided (386,879
words in Nepali).
For more information, see:

For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at

Visit our On-line Catalogue:
Visit the Universal Catalogue:
Archives of ELRA Language Resources Catalogue Updates:

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

ATALA décline toute responsabilité concernant le contenu des
messages diffusés sur la liste LN

More information about the Ln mailing list