16.2838, Software: New resources from the BNC

LINGUIST List linguist at linguistlist.org
Mon Oct 3 14:11:06 UTC 2005


LINGUIST List: Vol-16-2838. Mon Oct 03 2005. ISSN: 1068 - 4875.

Subject: 16.2838, Software: New resources from the BNC

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Maria Moreno-Rollins <maria at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 16-Sep-2005
From: Ylva Berglund < natcorp at oucs.ox.ac.uk >
Subject: New resources from the BNC 

	
-------------------------Message 1 ---------------------------------- 
Date: Mon, 03 Oct 2005 10:08:29
From: Ylva Berglund < natcorp at oucs.ox.ac.uk >
Subject: New resources from the BNC 
 

New resources from the BNC

We are pleased to announce the release of BNC Baby v 2 - a new CD
containing three English XML corpora (BNC Baby, BNC Sampler and Brown)
along with the latest release of the Xaira corpus search toolkit.

Further information about the CD and how to obtain it can be found below
and at http://www.natcorp.ox.ac.uk/babyinfo.html

BNC Baby is intended for use in teaching and learning about language from a
corpus perspective. Xaira is an open source indexing program, developed
specifically to give students the ability to experiment with many kinds of
searching strategies on many kinds of corpora. You can use the software on
the CD to develop your own searchable XML corpora, as well as to search the
sample corpora supplied with it.

The BNC-Baby disk (second edition) contains:
BNC-Baby
a subset of the British National Corpus. This contains four million-word
samples, representing four major text types in Modern English: informal
conversation, academic prose, fiction, and newspaper text. The texts are
annotated with part-of-speech information and come with detailed metadata.
Documentation of the corpus design and contents, and demonstration
materials for using it in English language teaching is also provided.
The BNC Sampler
a different subset of the British National Corpus. This contains two
million-word samples, representing spoken and written texts. These texts
were all hand-tagged and corrected during production of the BNC.
 A Standard Corpus of Present Day Edited American English (Brown)
the original Brown corpus, converted to XML, with POS tagging and lemmata
 Xaira
a new search and retrieval program for use with these - or any other - XML
corpora. The tool will allow you to search the corpora making use of the
metadata and tagging. For more information about Xaira see http://xaira.sf.net.

Price: 30 per CD
Special offer: Order 10 copies or more at and only pay 10 per CD!
(prices include standard airmail delivery charges but exclude VAT)
Order the CD at: http://www.natcorp.ox.ac.uk/orderform.html

-------------
British National Corpus
http://www.natcorp.ox.ac.uk/
natcorp at oucs.ox.ac.uk
------------- 
Linguistic Field(s): Applied Linguistics
                     Computational Linguistics
                     Language Acquisition
                     Text/Corpus Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-16-2838	

	



More information about the LINGUIST mailing list