[Corpora-List] free tagged corpus

Delip Rao deliprao at yahoo.com
Thu Nov 17 19:05:33 UTC 2005


Dear Martin/All,

By "free" I meant $0, not "freedom". As a research
student I would be willing to comply with the
legal/ethical restrictions etc. Most standard
literature in good conferences use corpora from
sources like LDC which are not available free of cost.
If my organization is not a member of LDC then I would
not have access to these. Are they any free-of-cost
PoS tagged corpora for experimentation that is well
accepted by the research community?

Thanks,
Delip

--- Martin Wynne <martin.wynne at oucs.ox.ac.uk> wrote:

> Dear Delip,
> 
> It depends on what you mean by 'freely available'.
> This has (at least) 
> two meanings in this context. It can mean free of
> cost, or it can mean 
> free of legal or ethical restrictions on its use.
> 
> Many corpora are do not cost money to use, although
> the ones mentioned 
> so far in this thread, such as the BNC and resources
> from the LDC, do 
> cost money.
> 
> As for legal and ethical restrictions, it may be
> useful to look at what 
> they say in the world of software, where several
> levels of freedom can 
> be differentiated:
> 
>      *  The freedom to run the program, for any
> purpose (freedom 0).
>      * The freedom to study how the program works,
> and adapt it to your 
> needs (freedom 1). Access to the source code is a
> precondition for this.
>      * The freedom to redistribute copies so you can
> help your neighbor 
> (freedom 2).
>      * The freedom to improve the program, and
> release your improvements 
> to the public, so that the whole community benefits
> (freedom 3). Access 
> to the source code is a precondition for this.
> 
> (from http://www.gnu.org/philosophy/free-sw.html)
> 
> With corpora, a parallel classification may be
> possible:
> 
>      * The freedom to access and analyse the corpus
> (freedom 0).
>      * The freedom to run your own tools on the
> corpus, and adapt it to 
> your needs (freedom 1). Access to the full text of
> the corpus is a 
> precondition for this.
>      * The freedom to redistribute copies so you can
> help your neighbor 
> (freedom 2).
>      * The freedom to add texts or metadata or
> annotations, and release 
> your improvements to the public, so that the whole
> community benefits 
> (freedom 3).
> 
> In most cases, any of the above freedoms may be
> restricted by only 
> allowing the relevant freedoms in the context of
> academic or 
> non-commercial research, though the precise terms of
> these restrictions 
> may vary, and the boundaries of non-commercial may
> not be easy to draw.
> 
> Usually a corpus creator cannot simply release a
> corpus under terms of 
> their choosing, allowing whichever of the above
> freedoms they want to, 
> because they don't own the rights over all of the
> texts contained in the 
> corpus. A corpus usually contains texts written or
> spoken by various 
> people, and these people, or publishers, or
> employers, or others, are 
> likely to have intellectual property rights over
> these texts. 
> (Furthermore, the corpus builders are acquire rights
> over the 
> collection, but these may reside not in the
> individuals but in their 
> institution or funders). To complicate things
> further, the relevant laws 
> relating to these rights vary in different
> countries, and have varied 
> over time.
> 
> My colleague Lou Burnard asked a similar question on
> this list in 
> January this year. You can see the start of the
> thread in the archive at
>
http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0501&L=CORPORA&D=0&I=-3&P=13048
> He was surprised to find virtually nothing which
> could be distributed 
> under something like an open source software
> licence.
> 
> The simplest answer to this is that you have to say
> a bit more precisely 
> what it is you want to be free to do with the
> corpus, and then maybe 
> you'll get some more answers.
> 
> Best wishes,
> Martin
> 
> 
> Delip Rao wrote:
> > Hello All,
> > 
> > Is there any freely available part-of-speech
> tagged
> > corpus for research/non-commercial use?
> > 
> > Thanks,
> > Delip Rao
> > -----------
> > AIDB LAB,
> > IIT MADRAS
> > 
> > 
> > 	
> > 	
> > 		
> > __________________________________ 
> > Do you Yahoo!? 
> > New and Improved Yahoo! Mail - 1GB free storage! 
> > http://sg.whatsnew.mail.yahoo.com
> > 
> > 
> 
> 
> -- 
> Martin Wynne
> Head of the Oxford Text Archive and
> AHDS Literature, Languages and Linguistics
> 
> Oxford University Computing Services
> 13 Banbury Road
> Oxford
> UK - OX2 6NN
> Tel: +44 1865 283299
> Fax: +44 1865 273275
> martin.wynne at oucs.ox.ac.uk
> 



	
	
		
__________________________________ 
Do you Yahoo!? 
New and Improved Yahoo! Mail - 1GB free storage! 
http://sg.whatsnew.mail.yahoo.com



More information about the Corpora mailing list