alexis nasr alexis.nasr at
Wed Mar 21 18:00:39 UTC 2001

[With the usual apologies for cross-posting]


After many delays (a postal strike being the latest) Oxford's
Humanities Computing Unit is now shipping the revised second edition
of the British National Corpus, which we are calling BNC-WORLD to
indicate that the corpus is now available under licence world wide.

For background information on the BNC, a one-hundred million word
snapshot of the English language at the end of the 20th century,
please visit our website at

A licence to use BNC World is available in two flavours: under the single
user licence (cost 50 pounds) you can install the whole corpus and the
SARA software on a single machine for personal use;  alternatively, for
250 pounds you can set up the corpus for networked access by up to 50
people. Alternatively, for the same prices, you can install just the
corpus itself and use whatever software you like.  The corpus is supplied
in compressed format as a single tar archive containing over 4000 files of
SGML data. Full documentation of the linguistic and structural tagging is

The part-of-speech tagging in the new edition has been extensively
revised at Lancaster University. Large numbers of errors and
inconsistencies in the tagging and markup have been removed, and the
encoding has been brought into conformance with recent standards. Several
enhancements and corrections have been made in the metadata attached to
each text. The SARA software now includes facilities for lemmatized searching,
improved handling of collocation searching, and the ability to build
and use arbitrary subcorpora.

For ordering information, please visit

Lou Burnard

 Lou Burnard                 
Message diffusé par la liste Langage Naturel <LN at>
Informations, abonnement :
English version          :
Archives                 :

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  :

More information about the Ln mailing list