Corpora: Czech National Corpus
Pavel Kveton
kveton at slivka.ff.cuni.cz
Thu Dec 21 10:49:15 UTC 2000
In November 2000, a 100-million representative corpus of written Czech,
called SYN2000, has been officially released for non-commercial use.
It is a major part of the Czech National Corpus project which consists of
other minor corpora, too, and which will be gradually released as well.
SYN2000 is basically a contemporary modern corpus (where, for example,
newspaper texts date from 1991-1999), which is planned for a many-sided
research, dictionary-makers etc. An access to it can be negotiated, against
signing a written statement, free of charge through the address
http://ucnk.ff.cuni.cz which serves also as a web address with some
additional information. Next to this, the same address offers a public access
to some 20 million of the large corpus in a somewhat limited way, too. An
accompanying book about the Czech National Corpus, containg a Manual for
using SYN200, which has just come out, is available from the Institute of the
Czech National Corpus who is responsible for the corpora developed under the
project.
Professor Frantisek Cermak
More information about the Corpora
mailing list