Corpora: Announcing a large Portuguese corpus

Diana Maria de Sousa Marques Pinto dos Santos Diana.Santos at informatics.sintef.no
Tue Sep 5 10:48:27 UTC 2000


Dear members of the corpora list,

We would like to announce the release of CETEMPúblico, a large corpus
(approx. 180 million words) of Portuguese newspaper language from the
Portuguese daily newspaper Público, created by our project as another
initiative to foster R&D in the processing of the Portuguese language.

Please see the corpus page for further details on distribution and
availability:
http://cgi.portugues.mct.pt/cetempublico/

Diana Santos & Paulo Rocha

Computational processing of Portuguese
http://www.portugues.mct.pt/
SINTEF Telecom and Informatics
Box 124 Blindern, N-0314 Oslo, Norway
projecto at informatics.sintef.no



More information about the Corpora mailing list