6.710, FYI: British Corpus, LSA Address List, Citations, Apology
The Linguist List
linguist at tam2000.tamu.edu
Sun May 21 15:30:34 UTC 1995
----------------------------------------------------------------------
LINGUIST List: Vol-6-710. Sun 21 May 1995. ISSN: 1068-4875. Lines: 247
Subject: 6.710, FYI: British Corpus, LSA Address List, Citations, Apology
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
Assoc. Editor: Ljuba Veselinova <lveselin at emunix.emich.edu>
Asst. Editors: Ron Reck <rreck at emunix.emich.edu>
Ann Dizdar <dizdar at tam2000.tamu.edu>
Annemarie Valdez <avaldez at emunix.emich.edu>
-------------------------Directory-------------------------------------
1)
Date: Thu, 18 May 1995 17:41:03 +0100
From: British National Corpus (natcorp at vax.ox.ac.uk)
Subject: British National Corpus: First Release
2)
Date: Tue, 16 May 95 08:46:24 -0400
From: anderson at sapir.ling.yale.edu
Subject: LSA email address list
3)
Date: Tue, 16 May 1995 09:52:00 -0700
From: ervintr1 at violet.berkeley.edu
Subject: electronic citations
4)
Date: Mon, 15 May 1995 08:53:38 +1000
From: Bert.Peeters at modlang.utas.edu.au (Bert Peeters)
Subject: Re: 6.683, Burmese - An apology
-------------------------Messages--------------------------------------
1)
Date: Thu, 18 May 1995 17:41:03 +0100
From: British National Corpus (natcorp at vax.ox.ac.uk)
Subject: British National Corpus: First Release
*********** BRITISH NATIONAL CORPUS DISTRIBUTION BEGINS **************
On behalf of the BNC Consortium, OUCS is very happy to announce that we
expect to start distributing copies of the long-awaited and British
National Corpus to licence holders during the week beginning 22 May.
This corpus is the end-product of a unique three-year collaboration,
involving Oxford University Press, Longman, Chambers-Harrap, Oxford
University Computing Services, Lancaster University and the British
Library, with funding from the DTI and SERC. It contains 100 million
words, from over 4000 different texts carefully selected to give maximal
coverage of the varieties of modern British English, both spoken and
written. The corpus is automatically tagged for part of speech, using
the CLAWS stochastic parser developed at UCREL, and marked up in SGML,
following the TEI Guidelines for corpus encoding.
The corpus is currently available under academic licence within the
European Union only. The first release, comprising three CDs and a
detailed technical manual currently costs under 200 pounds.
A full installation occupies about 4 Gb of disk space, and can only be
carried out on a Unix system. Later this year we hope to announce
availability of the BNC Sampler: a 2 million word sample from the
corpus, using an enhanced word-class tagset, manually corrected. This
Sampler will be usable on standalone PC.
* * * * * * * * * * * PRICE RISE IMMINENT * * * * * * * * * * *
* Our original budget for the cost of producing the BNC CDs *
* was based on the assumption that the whole thing would fit *
* onto two CDs. In the event, we needed three. However, we *
* are holding the price at the original estimate of 150 pounds *
* (plus VAT) until 1st JULY 1995. *
* *
* Orders received after 1st July 1995 will be charged at the *
* full price of 220 pounds plus VAT. We apologise for any *
* inconvenience this may cause. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
For full details, including ordering and licensing information, please
see our web pages at http://info.ox.ac.uk/bnc or write to the address
below.
-----------------------------------------------
B r i t i s h N a t i o n a l C o r p u s
-----------------------------------------------
FREQUENTLY ASKED QUESTIONS
last update: 15 May 95
***
Q. What's in the BNC?
A. Extracts from 4124 modern British English texts of all kinds, both
spoken and written. Each text is segmented into orthographic sentence
units, and each word automatically assigned a part of speech code. There
are six and a quarter million sentences, and over 100 million words.
Q. Where did it come from?
A. It was produced by a consortium of leading dictionary publishers
(OUP, Longman, Chambers-Harrap) and academic research centres (Oxford
University Computing Services, Unit for Computer Research in the
English Language at Lancaster University, British Library Research and
Development), with funding from DTI and SERC, and the British Academy.
It has taken three and a half years to complete.
Q. What use is it?
A. It provides a unique and authoritative view of the state of the English
language today, with carefully balanced representation of as many
different varieties of English as possible. It can be used to
exercise NLP systems of all kinds, as a fertile source of real life
examples for language learners, or simply to explore the way the
language is currently used.
Q. What do I have to do to get a copy of it?
A. If you want to use the corpus solely for purposes of academic
research, all you have to do is agree to the terms of the licence. If
you want to use it for other purposes, we will refer your request to the
BNC Consortium, who will discuss licensing arrangements with you.
Q. How much does it cost?
A. BNC Release 1.0 costs a total of 220 pounds, but we are holding to
the originally announced price of 150 pounds until 1 JULY 1995. All
prices are exclusive of VAT.
Q. What do I get for my money?
A. The first release of the BNC comprises:
-- the full text of the 100 million word corpus
-- printed and online documentation
-- a full word index to the whole corpus
-- ANSI C source code for the SARA server program and for a simple
SARA client program
packaged as 3 CD roms. The initial academic licence is valid for
five years.
Q. What kind of computer system will I need to use it?
A. You can unpack the distribution CDs on any Unix system capable of
reading ISO 9660 format. The corpus texts alone occupy nearly
2 Gb unpacked. The SARA index occupies a further 2 Gb.
The BNC is an SGML document complying with ISO 8879.
Q. How can I order a copy?
A. You will need to get a copy of the order form and two copies of the
licence. You can download these from our Web site or request them
from the address below.
Q. What are the licensing conditions?
A. The licence says you can use the corpus for any non-commercial
purposes, subject to the "fair-dealing" provisions of the Copyright
Act. At present, you must be located in a member state of the
EU. There are also a number of other conditions designed to protect the
owners of IPR in the corpus contents and the interests of the
commercial partners in the BNC Consortium.
Q. Is it available online?
A. Not yet. We have been running an experimental online service
for some months, but the software is not yet ready for release.
Watch this space for further announcements!
--------------------------------------------------------
British National Corpus
Oxford University Computing Services
13 Banbury Road
Oxford OX2 6NN
http://info.ox.ac.uk/bnc
tel +44 (1865) 273 280
fax +44 (1865) 273 275
natcorp at oucs.ox.ac.uk
----------------------------------------------------------
--------------------------------------------------------------------------
2)
Date: Tue, 16 May 95 08:46:24 -0400
From: anderson at sapir.ling.yale.edu
Subject: LSA email address list
The LSA maintains a list which contains the information supplied by
members about their e-mail addresses. This address list can now be
obtained by anonymous ftp to sapir.ling.yale.edu, where it will be
found in pub/LSA_email_list.txt. You can also get it by simply
clicking on the corresponding link on the Yale Linguistics WWW page:
(http://www.cis.yale.edu/linguist/).
This list was recently updated to reflect informationa vailable as of
10 May, 1995. If you retrieved the list before 16 May, 1995, you may
want to get the new version. We will update the list whenever the LSA
Secretariat sends us a new version; the date of the current version
will be found on the WWW page (as well as in the first line of the
file itself).
--Steve Anderson
--------------------------------------------------------------------------
3)
Date: Tue, 16 May 1995 09:52:00 -0700
From: ervintr1 at violet.berkeley.edu
Subject: electronic citations
Since so much information is conveyed by electronic means, citation
of source can be problematic. The following source deals with these
issues.
Electronic Style: A Guide to Citing
Electronic Information by Xia Li and Nancy Crane
Westport: Meckler (1993).
--------------------------------------------------------------------------
4)
Date: Mon, 15 May 1995 08:53:38 +1000
From: Bert.Peeters at modlang.utas.edu.au (Bert Peeters)
Subject: Re: 6.683, Burmese - An apology
I wish to apologise to the entire readership of the Linguist-list for
doing what I had painstakingly tried to avoid doing, which is to
misrepresent anyone's views in response to my query about Burmese.
It's hard to get a good handle on a flurry of incoming mail when you
are not a specialist in (or better still, when you don't know the slightest
thing about) the language you are asking a question on.
I must do one of two things now. Either become even more careful in
editing/paraphrasing answers, or stick to things I know at least something
about.
Once again, my sincerest apologies, to Randy LaPolla and all involved.
Bert Peeters
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dr Bert Peeters
Department of Modern Languages (French)
University of Tasmania
GPO Box 252C Tel. (002) 202344 +61 02 202344
Hobart TAS 7001 Fax. (002) 207813 +61 02 207813
Australia Email: Bert.Peeters at modlang.utas.edu.au
--------------------------------------------------------------------------
LINGUIST List: Vol-6-710.
More information about the LINGUIST
mailing list