Corpora: web-search

Ken Litkowski ken at
Fri May 5 18:45:17 UTC 2000

I would hope that this tool may be useful to lexicographers as you have
configured it.  Might I suggest, in addition to the suggestions already
made, that an output option include a format like that used in Senseval,
since there are many in the computational linguistics community who have
used that format for word-sense disambiguation studies.

The format would be a line with an identifier and then up to three
sentences of the source text, with the last sentence containing the
bracketed target word.  It wouldn't be crucial to be all-inclusive.  Use
a simple sentence-splitter and see if you can generate a set of
sentences.  If not, just discard the particular corpus instance.  This
would provide great training data.

Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page:

More information about the Corpora mailing list