Soft: Wiki2Tei converter 1.0

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Oct 12 19:14:31 UTC 2007

Date: Wed, 10 Oct 2007 20:35:23 +0200
From: Sylvain Loiseau <sylvain.loiseau at>
Message-ID: <20071010203523.o6kh3o0udc44088k at>

We are pleased to announce the first release of the Wiki2Tei software.
Wiki2Tei is a converter from the mediawiki format to XML (TEI

The mediawiki format is used by wikimedia fundation wikis (Wikipedia,
Wikibooks, Wikisource), and many other wikis using the mediawiki
software.  Large amounts of free hight-quality structured texts are
available in this format. These texts are used more and more often in
NLP (natural language processing) projects. However, the mediawiki
parser is oriented towards rendition and the mediawiki syntax is
complex and hard to parse.

The Wiki2Tei converter makes available the information contained in
wiki syntax (structuration, highlighting, etc.), and allows to
properly retrieve the plain text. This conversion is intended to
preserve all the properties of the original text. Wiki2Tei is closely
coupled with the mediawiki software, allowing to convert all the
features of the mediawiki syntax.

The Wiki2Tei converter provides a rich set of tools for converting
mediawiki text from several sources (file, mediawiki database) and
managing collections of files to be converted. The TEI vocabulary used
is documented, according to the TEI Guidelines, in an ODD
document. The code is open source and may be downloaded from the
SourceForge download area:

The web site contains full documentation and a "demo":

A mailing list is open:

Bernard Desgraupes,
Sylvain Loiseau

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list