Corpora: web-search

Ken Litkowski ken at clres.com
Fri May 5 18:45:17 UTC 2000


I would hope that this tool may be useful to lexicographers as you have
configured it.  Might I suggest, in addition to the suggestions already
made, that an output option include a format like that used in Senseval,
since there are many in the computational linguistics community who have
used that format for word-sense disambiguation studies.

The format would be a line with an identifier and then up to three
sentences of the source text, with the last sentence containing the
bracketed target word.  It wouldn't be crucial to be all-inclusive.  Use
a simple sentence-splitter and see if you can generate a set of
sentences.  If not, just discard the particular corpus instance.  This
would provide great training data.

	Ken
--
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com



More information about the Corpora mailing list