7.1205, FYI: Indo-European course register, Text corpus of Dutch
The Linguist List
linguist at tam2000.tamu.edu
Sun Sep 1 00:15:29 UTC 1996
LINGUIST List: Vol-7-1205. Sat Aug 31 1996. ISSN: 1068-4875. Lines: 128
Subject: 7.1205, FYI: Indo-European course register, Text corpus of Dutch
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu> (On Leave)
T. Daniel Seely: Eastern Michigan U. <dseely at emunix.emich.edu>
Associate Editor: Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
Ann Dizdar <dizdar at tam2000.tamu.edu>
Annemarie Valdez <avaldez at emunix.emich.edu>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Editor for this issue: dizdar at tam2000.tamu.edu (Ann Dizdar)
Date: Mon, 26 Aug 1996 15:04:53 +0200
From: martinez at em.uni-frankfurt.de ("Fco. Javier Martnez Garca")
Subject: TITUS: Indo-European Course Register
Date: Fri, 23 Aug 1996 16:00:11 -0000
From: ROB at rulxho.LeidenUniv.nl (Rob van Strien)
Subject: Text Corpus of Dutch
Date: Mon, 26 Aug 1996 15:04:53 +0200
From: martinez at em.uni-frankfurt.de ("Fco. Javier Martnez Garca")
Subject: TITUS: Indo-European Course Register
The Indo-European Course Register is offered by the Indogermanische
Gesellschaft and TITUS.
The Indo-European Course Register provides the names of the I-E
relating courses offered at the German speaking Universities.
See following URL:
Date: Fri, 23 Aug 1996 16:00:11 -0000
From: ROB at rulxho.LeidenUniv.nl (Rob van Strien)
Subject: Text Corpus of Dutch
On-line access to INL 38 Million Words Text Corpus of Dutch, for
non-commercial purposes.
The Institute for Dutch Lexicology INL offers you the possibility to
consult a Dutch text corpus of ca. 38 million words, by the
international computer network (Internet). In 1994 and 1995, a 5
Million Words Corpus with diversified composition and a 27 Million
Words Newspaper Corpus have been made accessible in a similar way.
Access is for free for non-commercial purposes.
The 38 Million Words Corpus 1996 consists of three main components: a
component with varied composition (1970-1989), a newspaper component
(Meppeler Courant, 1992-1995) and a legal component (1814-1989).
The user has the opportunity to define subcorpora, either on the basis
of the parameters (1) corpuscomponent, (2) topic, (3) publication
medium/text type, and (4) period, or on the basis of selections from
text surveys presented at the screen. The user can ask for the size of
each defined subcorpus.
The texts have automatically been annotated with lemma (head word) and
two types of part of speech (POS): a global one (13 POS categories)
and a fine-grained one (with subcategorization) conformant with the
MECOLB standard (EC-project MLAP93-21 MECOLB; coordinator R. Neumann,
Institut fuer Deutsche Sprache, Mannheim). The MECOLB-tagset for Dutch
was developed in cooperation with the TOSCA Research Group (University
of Nymegen), under responsibility of Prof. dr. J. Aarts.
Most of the data has not been corrected, neither on the level of the
text, nor on the level of POS and headword.
The retrieval system allows you to search for single words or for word
patterns, including some predefined syntactic patterns that can be
changed by the user. There are two query languages, which differ in
formalism. Searches may address the levels of word form, two types of
part of speech, and head word, both separately and in combination by
use of Boolean operators and proximity searches. During the search,
data concerning frequency and distribution over the texts are provided
at several levels. The output most often is a list of items, or a
series of concordances (words in context) with a variable,
user-defined textual context. Sorting facilities may support your
analysis of the output data. With some limitations due to copyright,
the output of your searches can be transfered to your own computer by
e-mail. It is not allowed to transfer complete texts or substantial
text parts.
The providers of the texts have given permission for use of the texts
for non-commercial, research purposes only.
Please note that for an optimal use of the retrieval system, the use
of a VT 220 (or higher) terminal, or an appropriate terminal-emulator
(e.g. Kermit) is recommended.
For access to the corpora, an individual user agreement is to be
signed. There is a separate user agreement for each corpus. An
electronic user agreement form can be obtained from our mailserver
Mailserv at Rulxho.Leidenuniv.NL. Type in the body of your e-mail
SEND [38MLN96]AGREEMNT.USE for the 38 Million Words Corpus 1996
SEND [27MLN95]AGREEMNT.USE for the 27 Million Words Newspaper Corpus
SEND [5MLN94]AGREEMNT.USE for the 5 Million Words Corpus 1994
Please make a hard copy of the agreement form, sign it, keep a copy
yourself, and return a signed copy to: Institute for Dutch Lexicology
INL, P.O. Box 9515, 2300 RA Leiden, The Netherlands. Fax: 31 71 527
After receipt of the signed user agreement, you will be informed about
your username and password.
If you need additional information, please send an e-mail message to
Helpdesk at Rulxho.Leidenuniv.NL, or send a fax to Mrs. dr. J.G. Kruyt.
LINGUIST List: Vol-7-1205.
More information about the LINGUIST
mailing list