[Corpora-List] text mining of full text articles/books

John McNaught John.McNaught at manchester.ac.uk
Fri Sep 30 10:53:28 UTC 2011


Dear Corpora list colleagues,

This mail is primarily addressed to academic researchers, and is in
relation to proposals by the UK Government to introduce legislation to
create a copyright exception for non-commercial text and data mining. It
would also be of interest to hear about such attempts in relation to
research into other aspects of NLP/CL e.g. corpus linguistics, machine
translation, multidocument summarisation, ...

I would be interested to hear from any academic researcher who has
attempted to obtain access to published content (especially full text
articles and books) for research purposes involving text mining or data
mining, and has not been successful in obtaining such access. Brief
details are perfectly OK, e.g.

Institution:
Research envisaged:  (very brief generic indication e.g. 1 sentence)
Reason(s) for failure to obtain access:
e.g. (by no means a closed list)
* blanket refusal
* read licensing conditions and gave up at that point (which particular
conditions presented barriers?)
* protracted negotiations leading nowhere, life is too short, gave up
* would have had to contact too many publishers to seek permission
* could not feasibly assign individual author attribution especially in
data mining phases
* payment requested even though your institution subscribes to the
journals or the e-books
* could not release results to or build services on results for the
community, so not worthwhile to pursue
	A special case of this is: got access only within the context of a
collaborative research project involving the publisher as a data
provider, but could not use content or results outside that project for
the benefit of the community
* format or broker issues (told "you have access already via your
institutional subscriptions" but this turns out to be access for humans
via some intermediate application that prevents or hinders text mining)

Any information of the above kind would be very welcome in helping to
form an evidence base.

John McNaught

-- 
John McNaught                   John.McNaught at manchester.ac.uk
School of Computer Science

and
Deputy Director
National Centre for Text Mining
Manchester Interdisciplinary Biocentre
University of Manchester
131 Princess Street                      tel: +44.161.306.3098
Manchester                               fax: +44.161.306.5201
M1 7DN                                   web: www.nactem.ac.uk
UK                                            www.textminingcentre.ac.uk


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list