Corpora: Annotated Old English corpus now available

Susan Pintzuk sp20 at york.ac.uk
Sun Aug 27 12:26:28 UTC 2000


      The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus
                        of Old English

The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
(henceforth the Brooklyn Corpus) is a selection of texts from the Old
English Section of the Helsinki Corpus of English Texts, annotated to
facilitate searches on lexical items and syntactic structure. It is
intended for the use of students and scholars of the history of the
English language.  The Brooklyn Corpus contains 106,210 words of Old
English text;  the samples from the longer texts are 5,000 to 10,000 words
in length. The texts represent a range of dates of composition, authors,
and genres. The texts in the Brooklyn Corpus are syntactically and
morphologically annotated, and each word is glossed. The size of the
corpus is approximately 12 megabytes.

The syntactic annotations enable the users to pose and answer questions
about word order, constituent order, abstract structure, and syntactic and
morphological characteristics of the texts in the corpus. The annotations
are general-purpose and as theory-neutral as possible, while still
incorporating the insights of modern linguistic theory; they can be used
by scholars with widely varying research interests. The syntactic
annotations mark constituents, both clausal and non-clausal, by labelled
brackets, with some relations marked by empty categories. The structure
assigned to a sentence by the labelled bracketing can be quite complex,
but it is not a complete syntactic analysis: the function of the
bracketing is not to assign a structure to Old English sentences but
rather to facilitate searches.

The Brooklyn Corpus is available without fee for educational and research
purposes, but it is not in the public domain. More information about the
Brooklyn Corpus and how to access it is available at
http://www-users.york.ac.uk/~sp20/corpus.html.  Downloading the Brooklyn
Corpus Manual is unrestricted, but the corpus texts and search scripts are
available only to users who agree formally to the conditions of use.


Susan Pintzuk
Department of Language and Linguistic Science
University of York
Heslington, York YO1 5DD
United Kingdom
sp20 at york.ac.uk
Telephone: +44 1904 432661



More information about the Corpora mailing list