[Corpora-List] A new ressource of spoken French

Christophe Benzitoun Christophe.Benzitoun at univ-nancy2.fr
Thu Jul 5 19:56:36 UTC 2012


Dear all,

We are glad to announce the availability of the Perceo Corpus, a 
collection of lemmatized and POS-tagged spoken French transcriptions. 
The data contains over 100,000 tokens automatically tagged and manually 
checked. It includes:

- The tagged corpus in Treetagger format;

- A merged lexicon containing Morphalou 2.0 and the Perceo corpus;

- A .par file for Treetagger.

All files are in UTF-8 encodings and are freely downloadable. It's a 
ressource initially designed to train tagging softwares.

The download site is:
http://cnrtl.fr/corpus/perceo/

Best regards.

-- 
Christophe Benzitoun, Maître de conférences à l'Université de Lorraine
Membre élu au Conseil d'Administration
UFR Sciences du langage
Membre de l'ATILF - Université de Lorraine & CNRS
44, avenue de la Libération
BP 30687
54063 Nancy cedex
tel : 03 54 50 53 40
e-mail : Christophe.Benzitoun at univ-lorraine.fr


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list