[Corpora-List] English-language paraphrase corpora

Paul Clough p.d.clough at sheffield.ac.uk
Wed Feb 2 10:56:18 UTC 2005


Dear all,

I have a collection of around 1800 news agency and newspaper texts created by
trained journalists for the specific purpose of analysing text reuse within
journalism. The METER corpus is currently available for research use and can be
obtained by contacting either Prof. Rob Gaizauskas (robertg at dcs.shef.ac.uk) or
myself. In the corpus, we have up to 9 UK national newspaper versions of an
agency text (including both tabloid and broadsheet versions) which have been
categorised into derived or not derived from the agency version. Find more
information about text reuse in journalism from my thesis (you can download
from here: http://ir.shef.ac.uk/cloughie/papers.html) and the METER web page:
http://www.dcs.shef.ac.uk/nlp/meter/

Regards,

Paul.


-------------------------------------------
Dr. Paul Clough
Dept. Information Studies
University of Sheffield

+44 (0)114 2222664
-------------------------------------------



Quoting radev at umich.edu:

> Our system, a precursor to Google News is also active on the Web:
>
> www.newsinessence.com
>
> Using it, we have collected 50,000 or so clusters of related news.
>
> --
> Drago
>
>
> nielsen at dcs.kcl.ac.uk wrote:
> >
> >
> > If you don't mind collecting raw text, news.google.com does this.
> >
> > Leif
> >
> > >
> > > Dear All,
> > >
> > > I am looking for English-language "comparable" corpora. I.e. I want,
> > > e.g., 2 collections of articles from different sources describing same
> > > events.
> > >
> > > Alternatively, would anyone know off-hand how one would go about
> > > constructing such comparable collections?
> > >
> > > (This is to be used for automatic paraphrasing.)
> > >
> > > Any pointers greatly appreciated,
> > >
> > > Olga
> > > University of Sussex NLP group
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
> --
> Dragomir R. Radev                                         radev at umich.edu
> Assistant Professor of Information, Electrical Engineering and
> Computer Science, and Linguistics, the University of Michigan, Ann Arbor
> Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev
>
>



More information about the Corpora mailing list