[Corpora-List] Irish language corpora

Adam Kilgarriff adam at lexmasterclass.com
Sat Nov 25 07:40:42 UTC 2006


Ronan,

The NCI (New Corpus for Ireland) contains 30M words of Irish from a wide
range of sources, with an emphasis on contemporary language.  It was
developed by our company, Lexicography MasterClass, and commissioned by
Foras na Gaeilge (FnG, the "Board for the Irish Language") to support the
development of a new English-Irish dictionary.  I have forwarded your
enquiry to FnG, who own the data, so decide who to give access to.

I'm just proofreading a paper on it - let me know if you want a copy

Adam Kilgarriff
Lexicography MasterClass Ltd
Lexical Computing Ltd
University of Sussex

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Mike Maxwell
Sent: 24 November 2006 21:31
To: CORPORA at UIB.NO
Subject: Re: [Corpora-List] Irish language corpora

fitzgerr at aston.ac.uk wrote:
> I am looking for a corpus of Irish language for some research, but all I
> seem to be able to find are corpora based on literary texts, predominantly
> dated from before the 20th Century.  For my research purposes, I need a
> corpus that contains terminology that is as contemporary as possible.

I presume you've looked at the NCI (Nation Corpus for Ireland), and that 
it doesn't meet your needs.

Have you looked at Keven Scannel's collection 
(http://borel.slu.edu/crubadan/index.html)?  Looks like he has a 25M 
word corpus of Irish, which I believe he collected entirely off the web.
-- 
	Mike Maxwell
	maxwell at ldc.upenn.edu



More information about the Corpora mailing list