Corpora: Czech National Corpus

Pavel Kveton kveton at slivka.ff.cuni.cz
Thu Dec 21 10:49:15 UTC 2000


In November 2000, a 100-million representative corpus of written Czech,
called SYN2000, has been officially released for non-commercial use. 

It is a major part of the Czech National Corpus project which consists of 
other minor corpora, too, and which will be gradually released as well. 
SYN2000 is basically a contemporary modern corpus (where, for example, 
newspaper texts date from 1991-1999), which is planned for a many-sided 
research, dictionary-makers etc. An access to it can be negotiated, against 
signing a written statement, free of charge through the address 
http://ucnk.ff.cuni.cz which serves also as a web address with some 
additional information. Next to this, the same address offers a public access 
to some 20 million of the large corpus in a somewhat limited way, too. An 
accompanying book about the Czech National Corpus, containg a Manual for 
using SYN200, which has just come out, is available from the Institute of the 
Czech National Corpus who is responsible for the corpora developed under the 
project.

Professor Frantisek Cermak



More information about the Corpora mailing list