[Corpora-List] KOLOKACJE program

Beata Wójtowicz wierzchob at wp.pl
Thu Apr 8 22:08:04 UTC 2004


On behalf of Aleksander Buczyński I would like to inform about availability
of a new program that combines web crawler and collocation finder -
"Kolokacje".
The program has been written by Aleksander Buczynski and is distributed for
free under the GNU General Public License.

The program can be used to:
- build a corpora of texts from selected websites, with an option to filter
out most of the HTML "noise" (duplicate pages, menus etc.);
- monitor changes on selected websites;
- find strong and/or frequent collocations;
- find keywords for a collection of documents;
- get sample contexts (concordances) for given words or collocations;
- compare 14 different statistical tests used for collocation detection.

The program can be accessed in a number of ways:
- through a simple graphical interface, provided by
kolokacje.standalone.SAMain and kolokacje.standalone.SAManager - - this is
the easiest way to get familiar with the basic functions;
- calling selected modules from the shell command line;
- calling selected methods from your own Java program;
- using kolokacje.server.PrettyPrinter and kolokacje.server.QueryServer to
build a web based interface;
- using kolokacje.server.PrettyPrinter to ask queries from a console and
then viewing the results in a HTML browser.

For more information and downloads, please see
http://www.mimuw.edu.pl/polszczyzna/kolokacje/index-en.htm

Kind regards,
Beata Wojtowicz



More information about the Corpora mailing list