Corpora: text collection difficulties

Janice McAlpine jm27 at post.queensu.ca
Thu Nov 2 19:26:49 UTC 2000


From: Janice McAlpine <jm27 at post.queensu.ca>

Dear Colleagues,
    I am in charge of the development of the the Strathy Corpus
of Canadian English, at Queen's University in Canada.  This corpus now
contains about 14 million words of published Canadian writing,
carefully edited to mirror published hard copy. It is supplemented
by hundreds of millions of words of newspaper writing on CD-ROM.  The
Strathy corpus has been created to study contemporary Canadian usage.
    The corpus was begun in 1981, at which time computers were
a novelty and the word "Internet" did not exist.  Writers and
publishers were often honoured and eager to have their work
consigned to an electronic repository devoted to the study of
Canadian English.  Now writers and publishers are extremely
wary of giving permission to reproduce their works in electronic
form.  For one thing, they fear piracy.  They also feel they should
be paid.  Newspapers and broadcasters now have commercial partners
which exist specifically to exploit the market for searchable
versions of news media.  Therefore, newspapers are no longer giving us
last year's CD-ROMs.  Also, just this year, Cancopy, Canada's
centralized copyright release clearinghouse, has announced
that they will handle requests from universities to place
authors' texts in electronic reserves and LANs.
     The upshot of all this is that I don't think I can get
free published texts at the rate at which we need them anymore--
unless I make a text solicitation campaign my full-time job (and I
have many other duties!)  Am I just losing my touch or have others
found that the temper of the times has changed regarding
text donation?
     All suggestions and comments are welcome.
Thanks,

Janice McAlpine  Contact me at   jm27 at post.queensu.ca
Director, Strathy Language Unit
Department of English
Queen's University
Kingston, On
Canada



More information about the Corpora mailing list