[Corpora-List] New computational resources for Galician language

mario.barcala at mundo-r.com mario.barcala at mundo-r.com
Thu Oct 28 17:53:33 UTC 2010


Dear all:

I am pleased to announce the presentation of several computational resources 
for Galician language developed at the Centro Ramón Piñeiro para a 
Investigación en Humanidades (http://www.cirp.es).

First, an updated version of CORGA (Reference Corpus of Present-day Galician 
Language). This latest version (1.5) reaches 25,8 million words and includes 
the list for all word frequencies ready to download.

It is available at the usual location:

http://corpus.cirp.es/corga

Second, the new version (2.4) of the lexicon (721.073 entries) and training 
corpus (426.051 grammatical elements) used by XIADA (Tagger/Lemmatizer for 
Galician Language) tagger can be downloaded at:

http://corpus.cirp.es/xiada

This time the training corpus package includes an XML file which allows to 
link grammatical elements with real forms of the sentences. This way, our 
training corpus can be easily used with other taggers.

The frequency list from CORGA and both, the lexicon and training corpus from 
XIADA, are distributed under the Lesser General Public License For Linguistic 
Resources (LGPLLR). See corresponding packages for details.

Finally, the documents included in the training corpus are published online 
using a search system which allows to query forms, tags and/or lemmas. It was 
automatically tagged by XIADA and manually revised. It includes some new useful 
functionalities:

  - An easy to use tag introduction menu.

  - One more query field to offer the possibility to search up to four 
grammatical elements.

This search system is available at: 

http://corpus.cirp.es/corgaetq

Our long-term aim is to offer all CORGA texts through this system to get an 
outstanding improvement on queries and results.

Regards,

--
Fco. Mario Barcala Rodríguez
Computing manager of CORGA project 


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list