17.1967, Software: Two New Corpora of Spoken and Written English

Thu Jul 6 14:25:03 UTC 2006

LINGUIST List: Vol-17-1967. Thu Jul 06 2006. ISSN: 1068 - 4875.

Subject: 17.1967, Software: Two New Corpora of Spoken and Written English

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Laura Welcher, Rosetta Project / Long Now Foundation  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Svetlana Aksenova <svetlana at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 06-Jul-2006
From: Christine Bowles < c.bowles at ucl.ac.uk >
Subject: Two New Corpora of Spoken and Written English 

-------------------------Message 1 ---------------------------------- 
Date: Thu, 06 Jul 2006 10:21:58
From: Christine Bowles < c.bowles at ucl.ac.uk >
Subject: Two New Corpora of Spoken and Written English 

The Survey of English Usage at UCL is pleased to announce the publication of two
exciting new corpora supplied with search software that allows for the retrieval
of grammatical patterns and constructions.

THE DIACHRONIC CORPUS OF PRESENT-DAY SPOKEN ENGLISH (DCPSE)

This corpus contains a total of 800,000 words of grammatically analysed (tagged
and parsed) spontaneous spoken English from comparable categories in the
London-Lund Corpus (1960s/1970s) and the ICE-GB Corpus (1990s): 400,000 words
from each corpus in the form of tree diagrams. The design of DCPSE is such that
it will be possible to study the grammatical features of spontaneous spoken
English over time. DCPSE is the largest single collection of tagged and parsed
orthographically transcribed spoken English in the world. The corpus will
provide linguists interested in recent linguistic change in English with a new,
innovative and searchable database. The corpus is suppplied on CD, together with
the ICECUP 3.1 search software (see below) and a 'Getting Started' manual.

RELEASE 2 OF THE BRITISH COMPONENT OF THE INTERNATIONAL CORPUS OF ENGLISH
(ICE-GB)

ICE-GB contains one million words of grammatically analysed (tagged and parsed)
spoken and written present-day British English in the form of tree diagrams. The
material in Release 2 of the corpus has been synchronised with sound recordings
for the spoken part of the corpus (a total of around 75 hours), which can be
supplied separately. Together with Release 2 of ICE-GB we are pleased to
announce the publication of ICECUP 3.1, the dedicated search software for ICE-GB
and DCPSE (see above). New features in ICECUP 3.1 include a lexicon and a
grammaticon, which can provide an overview of distributions of words, tags, and
grammatical patterns. The Fuzzy Tree Fragment (FTF) facility, which allows
searches for grammatical patterns, has been extended and improved. There are
many other improvements to ICECUP in this release, e.g. a thoroughly revised
on-line help manual covering all the new features. A new ICECUP 'Getting
Started' manual is published with the corpus.

ICE-GB SOUND RECORDINGS

The sound recordings (75 hours) will be available in the form of a set of CDs
containing uncompressed 'wave' files for installation on a hard disk.

For further details, including prices and upgrades, please visit:

http://www.ucl.ac.uk/english-usage/resources/sales.htm

or contact Christine Bowles: c.bowles at ucl.ac.uk

We offer very low prices for students. Please allow 4-6 weeks for delivery. 

Linguistic Field(s): Historical Linguistics
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): English (eng)

-----------------------------------------------------------
LINGUIST List: Vol-17-1967