[Corpora-List] SHARES document similarity system

Andrew Kehoe andrew at rdues.liv.ac.uk
Wed Mar 24 14:52:14 UTC 2004


Dear Colleague

For the past 3 years the Research and Development Unit for English
Studies has been working on an EPSRC-funded project called SHARES
(System of Hypermatrix Analysis, Retrieval, Evaluation and
Summarisation). The aim of the project was to test the hypothesis that
similar patterns of lexical repetition are sufficiently maintained
across differently authored documents on similar topics to support a
high-performance retrieval engine.

This will be of interest to people working on document similarity and
applications of Lexical Cohesion. We have produced an online demo
system and user guide, and would appreciate your feedback:

         http://www.rdues.liv.ac.uk/sharesguide

This demo system uses a small test corpus made up of 11 topics, with 3
news articles on each topic.  It allows the comparison of article pairs
or of 1 article with all other articles in the test corpus.  Stemming
and weighting options are available.  This is a cut-down version of our
full SHARES software, designed for faster online access.

An anonymous feedback form is provided on our website for your use:
http://www.rdues.liv.ac.uk/sfeedback.shtml. You may send comments by
email to andrew at rdues.liv.ac.uk if you prefer.

Thank you in advance

Andrew Kehoe
Research and Development Unit for English Studies
University of Liverpool
http://www.rdues.liv.ac.uk
WebCorp: http://www.webcorp.org.uk



More information about the Corpora mailing list