[Corpora-List] free tagged corpus

Kristofer Franzén franzen at sics.se
Thu Nov 17 19:32:29 UTC 2005


In what language?

/Kristofer Franzén



Delip Rao wrote:

>Dear Martin/All,
>
>By "free" I meant $0, not "freedom". As a research
>student I would be willing to comply with the
>legal/ethical restrictions etc. Most standard
>literature in good conferences use corpora from
>sources like LDC which are not available free of cost.
>If my organization is not a member of LDC then I would
>not have access to these. Are they any free-of-cost
>PoS tagged corpora for experimentation that is well
>accepted by the research community?
>
>Thanks,
>Delip
>
>--- Martin Wynne <martin.wynne at oucs.ox.ac.uk> wrote:
>
>  
>
>>Dear Delip,
>>
>>It depends on what you mean by 'freely available'.
>>This has (at least) 
>>two meanings in this context. It can mean free of
>>cost, or it can mean 
>>free of legal or ethical restrictions on its use.
>>
>>Many corpora are do not cost money to use, although
>>the ones mentioned 
>>so far in this thread, such as the BNC and resources
>>from the LDC, do 
>>cost money.
>>
>>As for legal and ethical restrictions, it may be
>>useful to look at what 
>>they say in the world of software, where several
>>levels of freedom can 
>>be differentiated:
>>
>>     *  The freedom to run the program, for any
>>purpose (freedom 0).
>>     * The freedom to study how the program works,
>>and adapt it to your 
>>needs (freedom 1). Access to the source code is a
>>precondition for this.
>>     * The freedom to redistribute copies so you can
>>help your neighbor 
>>(freedom 2).
>>     * The freedom to improve the program, and
>>release your improvements 
>>to the public, so that the whole community benefits
>>(freedom 3). Access 
>>to the source code is a precondition for this.
>>
>>(from http://www.gnu.org/philosophy/free-sw.html)
>>
>>With corpora, a parallel classification may be
>>possible:
>>
>>     * The freedom to access and analyse the corpus
>>(freedom 0).
>>     * The freedom to run your own tools on the
>>corpus, and adapt it to 
>>your needs (freedom 1). Access to the full text of
>>the corpus is a 
>>precondition for this.
>>     * The freedom to redistribute copies so you can
>>help your neighbor 
>>(freedom 2).
>>     * The freedom to add texts or metadata or
>>annotations, and release 
>>your improvements to the public, so that the whole
>>community benefits 
>>(freedom 3).
>>
>>In most cases, any of the above freedoms may be
>>restricted by only 
>>allowing the relevant freedoms in the context of
>>academic or 
>>non-commercial research, though the precise terms of
>>these restrictions 
>>may vary, and the boundaries of non-commercial may
>>not be easy to draw.
>>
>>Usually a corpus creator cannot simply release a
>>corpus under terms of 
>>their choosing, allowing whichever of the above
>>freedoms they want to, 
>>because they don't own the rights over all of the
>>texts contained in the 
>>corpus. A corpus usually contains texts written or
>>spoken by various 
>>people, and these people, or publishers, or
>>employers, or others, are 
>>likely to have intellectual property rights over
>>these texts. 
>>(Furthermore, the corpus builders are acquire rights
>>over the 
>>collection, but these may reside not in the
>>individuals but in their 
>>institution or funders). To complicate things
>>further, the relevant laws 
>>relating to these rights vary in different
>>countries, and have varied 
>>over time.
>>
>>My colleague Lou Burnard asked a similar question on
>>this list in 
>>January this year. You can see the start of the
>>thread in the archive at
>>
>>    
>>
>http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0501&L=CORPORA&D=0&I=-3&P=13048
>  
>
>>He was surprised to find virtually nothing which
>>could be distributed 
>>under something like an open source software
>>licence.
>>
>>The simplest answer to this is that you have to say
>>a bit more precisely 
>>what it is you want to be free to do with the
>>corpus, and then maybe 
>>you'll get some more answers.
>>
>>Best wishes,
>>Martin
>>
>>
>>Delip Rao wrote:
>>    
>>
>>>Hello All,
>>>
>>>Is there any freely available part-of-speech
>>>      
>>>
>>tagged
>>    
>>
>>>corpus for research/non-commercial use?
>>>
>>>Thanks,
>>>Delip Rao
>>>-----------
>>>AIDB LAB,
>>>IIT MADRAS
>>>
>>>
>>>	
>>>	
>>>		
>>>__________________________________ 
>>>Do you Yahoo!? 
>>>New and Improved Yahoo! Mail - 1GB free storage! 
>>>http://sg.whatsnew.mail.yahoo.com
>>>
>>>
>>>      
>>>
>>-- 
>>Martin Wynne
>>Head of the Oxford Text Archive and
>>AHDS Literature, Languages and Linguistics
>>
>>Oxford University Computing Services
>>13 Banbury Road
>>Oxford
>>UK - OX2 6NN
>>Tel: +44 1865 283299
>>Fax: +44 1865 273275
>>martin.wynne at oucs.ox.ac.uk
>>
>>    
>>
>
>
>
>	
>	
>		
>__________________________________ 
>Do you Yahoo!? 
>New and Improved Yahoo! Mail - 1GB free storage! 
>http://sg.whatsnew.mail.yahoo.com
>  
>



More information about the Corpora mailing list