<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6556.0">
<TITLE>[Corpora-List] English-language paraphrase corpora</TITLE>
</HEAD>
<BODY dir=ltr>
<DIV>
<TABLE class=tblMsgBody lang=EN-US cellPadding=6 width="100%" border=0>
<TR>
<TD vAlign=top width="100%" height=300>
<DIV class=Section1>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman"
size=3><SPAN>Olga,</SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>We have
developed an automated system called SHARES (System of
Hypermatrix Analysis, Retrieval, Evaluation and Summarisation) that
clusters related documents in a general news corpus and ranks them in
order of similarity, by a method developed out of past work, including
lexical cohesion. SHARES identifies topic at a series of levels, and is
more linguistically refined in its approach than some other systems. There
is a small demo online at <A
href="http://www.rdues.uce.ac.uk/sharesguide/"
target=_BLANK>www.rdues.uce.ac.uk/sharesguide/</A> and a user guide. Such clustering
techniques can be used on a pair of general corpora from
different sources to extract the kind of sets you are
interested in.</SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman"
size=3><SPAN>Yours</SPAN></FONT></P>
<P class=MsoNormal><FONT face="Times New Roman"
size=3><SPAN></SPAN></FONT><FONT face="Times New Roman" size=3><SPAN>Jay
</SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman"
size=3><SPAN></SPAN></FONT> </P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>Jay
Banerjee</SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>Research and
Development Unit for English Studies</SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN>University of
Central</SPAN></FONT> England, Birmingham</P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN><A
href="http://www.rdues.uce.ac.uk"
target=_BLANK>http://www.rdues.uce.ac.uk</A></SPAN></FONT></P></DIV></DIV></TD></TR></TABLE></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV><FONT size=2>-----Original Message----- <BR><B>From:</B>
owner-corpora@lists.uib.no on behalf of Olga Shaumyan
<BR><B>Sent:</B> Mon 31/01/2005 23:06 <BR><B>To:</B> corpora@uib.no
<BR><B>Cc:</B> <BR><B>Subject:</B> [Corpora-List] English-language paraphrase
corpora<BR><BR></FONT></DIV><BR>
<P><FONT size=2>Dear All,</FONT> </P>
<P><FONT size=2>I am looking for English-language "comparable" corpora. I.e. I
want,</FONT> <BR><FONT size=2>e.g., 2 collections of articles from different
sources describing same events.</FONT> </P>
<P><FONT size=2>Alternatively, would anyone know off-hand how one would go
about </FONT><BR><FONT size=2>constructing such comparable collections?</FONT>
</P>
<P><FONT size=2>(This is to be used for automatic paraphrasing.)</FONT> </P>
<P><FONT size=2>Any pointers greatly appreciated,</FONT> </P>
<P><FONT size=2>Olga</FONT> <BR><FONT size=2>University of Sussex NLP
group</FONT> </P><BR><BR><BR><BR><BR><BR></BLOCKQUOTE>
</BODY>
</HTML>