[Corpora-List] Irish language corpora

Kevin Scannell kscanne at gmail.com
Fri Nov 24 23:09:52 UTC 2006


On 16:31 Fri 24 Nov     , Mike Maxwell wrote:
> fitzgerr at aston.ac.uk wrote:
> >I am looking for a corpus of Irish language for some research, but all I
> >seem to be able to find are corpora based on literary texts, predominantly
> >dated from before the 20th Century.  For my research purposes, I need a
> >corpus that contains terminology that is as contemporary as possible.
> 
> I presume you've looked at the NCI (Nation Corpus for Ireland), and that 
> it doesn't meet your needs.
> 
> Have you looked at Keven Scannel's collection 
> (http://borel.slu.edu/crubadan/index.html)?  Looks like he has a 25M 
> word corpus of Irish, which I believe he collected entirely off the web.

 Yes, I have large web-crawled corpora from the crubadan project,
and also from some on-going web crawling in support of my
search engine www.aimsigh.com (a description of that site in
English is here: http://www.aimsigh.com/eolas.html - some list
members might find the ideas behind the site interesting even though
it only supports Irish at the moment).  In all, there are about
100 million words of Irish on the web that are indexed by the site. 

Ronan, feel free to write me off-list and I can see about putting
together something suitable for you.

-Kevin



More information about the Corpora mailing list