Ressources: XML/TEI Human Rights Corpus

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Tue Oct 11 14:07:33 UTC 2005

Date: Tue, 11 Oct 2005 10:07:21 +0200
From: Pincemin <benie at>
Message-ID: <434B72B9.8050303 at>

We are happy to announce the release of the Human Rights Corpus /
Corpus Droits de l'Homme, v.1, available on our web site : Université
de Paris 13 - Laboratoire de Linguistique Informatique

The corpus is composed of 28 International Conventions, from 1948
(Universal Declaration of Human Rights) up to 2000. The choice of the
texts has been made with an expert of the field, with the aim to have
a representative view of the Human Rights reference texts and of the
language and vocabulary used.

Each text is given in 2 or 3 languages : English and French, and
Spanish when the Convention is one of the United Nations. These
versions are aligned at the level of the finest subdivision (article)
through an appropriate design of identifiers.

The encoding is in XML and follows the guidelines of the TEI. A
special attention has been devoted to the realization of the Header ;
in particular, the "TagUsage" part is fully developped in order to
make understandable the choices made for the encoding and the meaning
of each XML/TEI tag in our context.

Please contact us to let us know your interests or remarks :
corpus at

Fabrice ISSAC, Computational Linguist
Christine CHODKIEWICZ, Lawyer and Linguist
Bénédicte PINCEMIN, Linguist

Message diffusé par la liste Langage Naturel <LN at>
Informations, abonnement :
English version          :
Archives                 :

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  :

More information about the Ln mailing list