Corpora: Re: Collocates

Daniel Ridings ridings at holding.gu.se
Mon Apr 17 15:23:01 UTC 2000


Dear Victoria,

You said the corpus doesn't matter :)

Lexilogik in Sweden has been working on exactly the kind of tool you asked
for. It is still under development and we make no promises, but it does
give interesting results ... for Swedish.

http://www.lexilogik.com/~sofia/conc.html

It allows you to provide the pivot word, choose how big of a window around
that word you are interested in and you can choose one of three statistics
to be used for finding the collocates. The only one of any real use is LLR
(log likelihood ratio).

The corpus is a tad over 30 million words. The return result is first a
list of x-number of significant friends where x is 30 by default, but is
something you can fiddle with. After that, concordance lines are returned,
sorted not alphabetically, but by descending order of "significance". The
"collocates" are set out in bold.

!!!Be warned: This is experimental. The results are returned as XML. If
you use Explorer 5, the XSL stylesheet will make sure you get a nice
presentation on your screen. All computation is done when you perform you
search. If your search word occurs about 60,000 times, it will probably
take a minute or two to get a result, if your browser can swallow 60,000
concordance lines, that is.

Ok, it's rough and ready, but we're having fun with it (which might cause
others problems ... it's a moving target).

> What I want to know is if anyone is aware of a web-site where I can enter a
> word and get the
> co-occurrences for that word. It doesn't really matter what Corpus.


Daniel Ridings
Managing Director
LexiLogik AB
Erik Dahlbergsgatan 11b 6tr
411 26 Göteborg, Sweden
Tel: 031 773 47 99 Fax: 070 610 46 18



More information about the Corpora mailing list