[Corpora-List] free tagged corpus
Delip Rao
deliprao at yahoo.com
Thu Nov 17 19:05:33 UTC 2005
Dear Martin/All,
By "free" I meant $0, not "freedom". As a research
student I would be willing to comply with the
legal/ethical restrictions etc. Most standard
literature in good conferences use corpora from
sources like LDC which are not available free of cost.
If my organization is not a member of LDC then I would
not have access to these. Are they any free-of-cost
PoS tagged corpora for experimentation that is well
accepted by the research community?
Thanks,
Delip
--- Martin Wynne <martin.wynne at oucs.ox.ac.uk> wrote:
> Dear Delip,
>
> It depends on what you mean by 'freely available'.
> This has (at least)
> two meanings in this context. It can mean free of
> cost, or it can mean
> free of legal or ethical restrictions on its use.
>
> Many corpora are do not cost money to use, although
> the ones mentioned
> so far in this thread, such as the BNC and resources
> from the LDC, do
> cost money.
>
> As for legal and ethical restrictions, it may be
> useful to look at what
> they say in the world of software, where several
> levels of freedom can
> be differentiated:
>
> * The freedom to run the program, for any
> purpose (freedom 0).
> * The freedom to study how the program works,
> and adapt it to your
> needs (freedom 1). Access to the source code is a
> precondition for this.
> * The freedom to redistribute copies so you can
> help your neighbor
> (freedom 2).
> * The freedom to improve the program, and
> release your improvements
> to the public, so that the whole community benefits
> (freedom 3). Access
> to the source code is a precondition for this.
>
> (from http://www.gnu.org/philosophy/free-sw.html)
>
> With corpora, a parallel classification may be
> possible:
>
> * The freedom to access and analyse the corpus
> (freedom 0).
> * The freedom to run your own tools on the
> corpus, and adapt it to
> your needs (freedom 1). Access to the full text of
> the corpus is a
> precondition for this.
> * The freedom to redistribute copies so you can
> help your neighbor
> (freedom 2).
> * The freedom to add texts or metadata or
> annotations, and release
> your improvements to the public, so that the whole
> community benefits
> (freedom 3).
>
> In most cases, any of the above freedoms may be
> restricted by only
> allowing the relevant freedoms in the context of
> academic or
> non-commercial research, though the precise terms of
> these restrictions
> may vary, and the boundaries of non-commercial may
> not be easy to draw.
>
> Usually a corpus creator cannot simply release a
> corpus under terms of
> their choosing, allowing whichever of the above
> freedoms they want to,
> because they don't own the rights over all of the
> texts contained in the
> corpus. A corpus usually contains texts written or
> spoken by various
> people, and these people, or publishers, or
> employers, or others, are
> likely to have intellectual property rights over
> these texts.
> (Furthermore, the corpus builders are acquire rights
> over the
> collection, but these may reside not in the
> individuals but in their
> institution or funders). To complicate things
> further, the relevant laws
> relating to these rights vary in different
> countries, and have varied
> over time.
>
> My colleague Lou Burnard asked a similar question on
> this list in
> January this year. You can see the start of the
> thread in the archive at
>
http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0501&L=CORPORA&D=0&I=-3&P=13048
> He was surprised to find virtually nothing which
> could be distributed
> under something like an open source software
> licence.
>
> The simplest answer to this is that you have to say
> a bit more precisely
> what it is you want to be free to do with the
> corpus, and then maybe
> you'll get some more answers.
>
> Best wishes,
> Martin
>
>
> Delip Rao wrote:
> > Hello All,
> >
> > Is there any freely available part-of-speech
> tagged
> > corpus for research/non-commercial use?
> >
> > Thanks,
> > Delip Rao
> > -----------
> > AIDB LAB,
> > IIT MADRAS
> >
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > New and Improved Yahoo! Mail - 1GB free storage!
> > http://sg.whatsnew.mail.yahoo.com
> >
> >
>
>
> --
> Martin Wynne
> Head of the Oxford Text Archive and
> AHDS Literature, Languages and Linguistics
>
> Oxford University Computing Services
> 13 Banbury Road
> Oxford
> UK - OX2 6NN
> Tel: +44 1865 283299
> Fax: +44 1865 273275
> martin.wynne at oucs.ox.ac.uk
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 1GB free storage!
http://sg.whatsnew.mail.yahoo.com
More information about the Corpora
mailing list