[Corpora-List] New computational resources for Galician language

Fco. Mario Barcala Rodr íguez mario.barcala at mundo-r.com
Mon May 6 15:39:52 UTC 2013


Dear all:

I am pleased to announce the publication of several computational
resources for Galician language developed at the Centro Ramón Piñeiro
para a Investigación en Humanidades (http://www.cirp.es).

First, an updated version of CORGA (Reference Corpus of Present-day
Galician Language). This latest version (1.6) reaches 29 million words
and includes the list for all word frequencies ready to download.

It is available at the usual location:

http://corpus.cirp.es/corga

Second, the new version (2.5) of lexicon (730.256 entries) and
training corpus (594.993 grammatical elements) used by XIADA
(Tagger/Lemmatizer for Galician Language) tagger can be downloaded at:

http://corpus.cirp.es/xiada

The frequency list from CORGA and both, the lexicon and training
corpus from XIADA, are distributed under the Lesser General Public
License For Linguistic Resources (LGPLLR). See corresponding packages
for details.

Finally, the documents included in the training corpus are published
online using a search system which allows to query forms, tags and/or
lemmas. It was automatically tagged by XIADA and manually revised.

This search system is available at:

http://corpus.cirp.es/corgaetq

Our mid-term aim is to offer all CORGA texts through this system to
get an outstanding improvement on queries and results.

Regards,

--
Mario Barcala
Computer engineering manager of CORGA project


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list