[Corpora-List] New computational resources for Galician language
mario.barcala at mundo-r.com
mario.barcala at mundo-r.com
Thu Oct 28 17:53:33 UTC 2010
Dear all:
I am pleased to announce the presentation of several computational resources
for Galician language developed at the Centro Ramón Piñeiro para a
Investigación en Humanidades (http://www.cirp.es).
First, an updated version of CORGA (Reference Corpus of Present-day Galician
Language). This latest version (1.5) reaches 25,8 million words and includes
the list for all word frequencies ready to download.
It is available at the usual location:
http://corpus.cirp.es/corga
Second, the new version (2.4) of the lexicon (721.073 entries) and training
corpus (426.051 grammatical elements) used by XIADA (Tagger/Lemmatizer for
Galician Language) tagger can be downloaded at:
http://corpus.cirp.es/xiada
This time the training corpus package includes an XML file which allows to
link grammatical elements with real forms of the sentences. This way, our
training corpus can be easily used with other taggers.
The frequency list from CORGA and both, the lexicon and training corpus from
XIADA, are distributed under the Lesser General Public License For Linguistic
Resources (LGPLLR). See corresponding packages for details.
Finally, the documents included in the training corpus are published online
using a search system which allows to query forms, tags and/or lemmas. It was
automatically tagged by XIADA and manually revised. It includes some new useful
functionalities:
- An easy to use tag introduction menu.
- One more query field to offer the possibility to search up to four
grammatical elements.
This search system is available at:
http://corpus.cirp.es/corgaetq
Our long-term aim is to offer all CORGA texts through this system to get an
outstanding improvement on queries and results.
Regards,
--
Fco. Mario Barcala Rodríguez
Computing manager of CORGA project
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list