Corpora: Annotated Old English corpus now available

Susan Pintzuk sp20 at york.ac.uk
Mon Aug 28 13:16:38 UTC 2000


      The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus
                        of Old English

The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old
English (henceforth the Brooklyn Corpus) is a selection of
texts from the Old English Section of the Helsinki Corpus of
English Texts, annotated to facilitate searches on lexical
items and syntactic structure. It is intended for the use of
students and scholars of the history of the English
language.
The Brooklyn Corpus contains 106,210 words of Old English
text;
the samples from the longer texts are 5,000 to 10,000 words
in
length. The texts represent a range of dates of composition,
authors, and genres. The texts in the Brooklyn Corpus are
syntactically and morphologically annotated, and each word
is
glossed. The size of the corpus is approximately 12
megabytes.

The syntactic annotations enable the users to pose and
answer
questions about word order, constituent order, abstract
structure, and syntactic and morphological characteristics
of
the texts in the corpus. The annotations are general-purpose
and as theory-neutral as possible, while still incorporating
the insights of modern linguistic theory; they can be used
by
scholars with widely varying research interests. The
syntactic
annotations mark constituents, both clausal and non-clausal,
by
labelled brackets, with some relations marked by empty
categories. The structure assigned to a sentence by the
labelled bracketing can be quite complex, but it is not a
complete syntactic analysis: the function of the bracketing
is
not to assign a structure to Old English sentences but
rather
to facilitate searches.

The Brooklyn Corpus is available without fee for educational
and research purposes, but it is not in the public domain.
More
information about the Brooklyn Corpus and how to access it
is
available at http://www-users.york.ac.uk/~sp20/corpus.html.
Downloading the Brooklyn Corpus Manual is unrestricted, but
the
corpus texts and search scripts are available only to users
who
agree formally to the conditions of use.


Susan Pintzuk
Department of Language and Linguistic Science
University of York
Heslington, York YO1 5DD
United Kingdom
sp20 at york.ac.uk
Telephone: +44 1904 432661



More information about the Corpora mailing list