<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>


<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6556.0">

<TITLE>[Corpora-List] English-language paraphrase corpora</TITLE>

</HEAD>

<BODY dir=ltr>

<DIV>

<TABLE class=tblMsgBody lang=EN-US cellPadding=6 width="100%" border=0>

  
  <TR>

    <TD vAlign=top width="100%" height=300>

      <DIV class=Section1>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" 

      size=3><SPAN>Olga,</SPAN></FONT></P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>We have 

      developed an automated system called SHARES  (System of 

      Hypermatrix Analysis, Retrieval, Evaluation and Summarisation) that 

      clusters related documents in a general news corpus and ranks them in 

      order of similarity, by a method developed out of past work, including 

      lexical cohesion. SHARES identifies topic at a series of levels, and is 

      more linguistically refined in its approach than some other systems. There 

      is a small demo online at <A 

      href="http://www.rdues.uce.ac.uk/sharesguide/" 

      target=_BLANK>www.rdues.uce.ac.uk/sharesguide/</A>  and a user guide. Such clustering 

      techniques can be used on a pair of general corpora from 

      different sources to extract the kind of sets you are 

      interested in.</SPAN></FONT></P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" 

      size=3><SPAN>Yours</SPAN></FONT></P>

      <P class=MsoNormal><FONT face="Times New Roman" 

      size=3><SPAN></SPAN></FONT><FONT face="Times New Roman" size=3><SPAN>Jay 

      </SPAN></FONT></P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" 

      size=3><SPAN></SPAN></FONT> </P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>Jay 

      Banerjee</SPAN></FONT></P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>Research and 

      Development Unit for English Studies</SPAN></FONT></P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>University of 

      Central</SPAN></FONT> England, Birmingham</P></DIV>

      <DIV>

      <P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN><A 

      href="http://www.rdues.uce.ac.uk" 

      target=_BLANK>http://www.rdues.uce.ac.uk</A></SPAN></FONT></P></DIV></DIV></TD></TR></TABLE></DIV>

<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">

  <DIV><FONT size=2>-----Original Message----- <BR><B>From:</B> 

  owner-corpora@lists.uib.no on behalf of Olga Shaumyan 

  <BR><B>Sent:</B> Mon 31/01/2005 23:06 <BR><B>To:</B> corpora@uib.no 

  <BR><B>Cc:</B> <BR><B>Subject:</B> [Corpora-List] English-language paraphrase 

  corpora<BR><BR></FONT></DIV><BR>

  <P><FONT size=2>Dear All,</FONT> </P>

  <P><FONT size=2>I am looking for English-language "comparable" corpora. I.e. I 

  want,</FONT> <BR><FONT size=2>e.g., 2 collections of articles from different 

  sources describing same events.</FONT> </P>

  <P><FONT size=2>Alternatively, would anyone know off-hand how one would go 

  about </FONT><BR><FONT size=2>constructing such comparable collections?</FONT> 

  </P>

  <P><FONT size=2>(This is to be used for automatic paraphrasing.)</FONT> </P>

  <P><FONT size=2>Any pointers greatly appreciated,</FONT> </P>

  <P><FONT size=2>Olga</FONT> <BR><FONT size=2>University of Sussex NLP 

  group</FONT> </P><BR><BR><BR><BR><BR><BR></BLOCKQUOTE>


</BODY>

</HTML>