[Corpora-List] Texts with keywords for supervised learning

Lee, David dvdlee at umich.edu
Thu Jan 16 16:43:30 UTC 2003


Williams' correct. In fact, when working on my BNC Index, I manually copied and pasted keywords from the COPAC libary catalogue system into the BNC Index spreadsheet (thus they are easily retrievable). Most BNC texts that were taken from published books therefore have library keywords associated with them. All that's needed is a licence for the BNC World Edition.

Hope this helps.

Dave.
___________________________________________________
David YW Lee
dvdlee at umich.edu
Research Fellow, MICASE project
English Language Institute, University of Michigan
TCF Building, 401 E. Liberty, Suite 350, Rm 3140
Ann Arbor, Michigan 48104-2298, USA. Tel: +1 734-615-9638 (O)

MICASE web site: http://www.lsa.umich.edu/eli/micase/micase.htm
Corpus-based Linguistics web site: http://devoted.to/corpora
___________________________________________________


> -----Original Message-----
> From: William Mann [mailto:bill_mann at sil.org]
> Sent: Thu, January 16, 2003 11:26 AM
> To: Anette Hulth; corpora at hit.uib.no
> Subject: Re: [Corpora-List] Texts with keywords for 
> supervised learning
> 
> 
> My impression is that many library catalogs are really this 
> sort of corpus,
> except that the texts are on the shelves.
> 
> Perhaps catalogs of items that are available on line could be 
> converted into
> being this sort of corpus.
> 
> Bill Mann
> 
> ----- Original Message -----
> From: "Anette Hulth" <hulth at dsv.su.se>
> To: <corpora at hit.uib.no>
> Sent: Thursday, January 16, 2003 9:41 AM
> Subject: [Corpora-List] Texts with keywords for supervised learning
> 
> 
> > Dear list members,
> >
> > I'm currently doing experiments on keyword derivation,
> > treating it as a supervised learning task. (By keywords
> > a mean a set of say 3-15 words reflecting the content
> > of the actual text.) I wonder if there is anybody who's
> > aware of any freely available corpus of text documents
> > in English, with manually assigned keywords that may
> > be (automatically) extracted. Any pointers will be much
> > appreciated!
> >
> > Kind regards
> >     /Anette Hulth
> >
> > ---------------------------------------------------
> >   Anette Hulth
> >   Dept. of Computer and Systems Sciences
> >   Stockholm University / KTH
> >   Sweden
> > ---------------------------------------------------
> >
> >
> >
> >
> >
> 
> 
> 



More information about the Corpora mailing list