[Corpora-List] ELRA - Language Resources Catalogue - Update
ELDA
info at elda.org
Thu Mar 16 11:26:08 UTC 2006
Our apologies if you have received multiple copies of this announcement
*******************************************************************
ELRA - Language Resources Catalogue - Update
*******************************************************************
We are happy to announce that new Text and Speech Language Resources are
now available in our catalogue.
To view all the Language Resources available, you can visit our on-line
catalogue : http://catalog.elda.org/index.php?language=en
L0058: British English Source Lexicon (BESL) version 2.2
BESL consists of over 230,000 lemmas, over 350,000 word forms, 60,000
proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns.
Each headword is provided with a full listing of all inflected forms and
other morphological variation. Every word form is marked for part of
speech (using Penn TreeBank notation). Most single-word forms include a
representation of IPA pronunciation. BESL covers both British and
American English, and other spelling variants, with cross-references
between corresponding forms. BESL is provided in XML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=834&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0059: Offensive Word Filter 1
This list features 4500 words and expressions for UK and US English
usage with a grading system describing vocabulary type and offensive
strength for each term, plus collocational information to help identify
the terms in context. The list is provided in tab-delimited ASCII.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=835&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0060: Offensive Word Filter 2
This list features 2000 words and expressions, classified into 13
categories, for UK and US English usage with a grading system describing
vocabulary type and offensive strength for each term, plus collocational
information to help identify the terms in context. The list is provided
in an Excel spreadsheet.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=836&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0061: The Oxford Spanish Dictionary
This dictionary consists of 300,000 words and phrases, 500,000
translations, for 24 regional varieties of Spanish. It includes
thousands of real, authentic example sentences carefully selected to
illustrate the full range of meanings and typical contexts. The
dictionary is provided in XML or SGML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=837&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0062 : L0062 French Source Lexicon
This source lexicon contains morphological and phonetic data for French.
It consists of over 90,000 headwords/lemmas, 400,000 wordforms, 1,000
abbreviations, and 35,000 proper nouns. Each headword lemma is provided
with a full listing of its possible syntactic forms and spelling
variants, along with information on their relationship to the headword
form. In addition, a representation of the IPA pronunciation is given
for every form. There is also information on domains in which the
headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is
provided in SGML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=838&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0063 : L0063 Spanish Source Lexicon
This source lexicon contains morphological and phonetic data for
Spanish. It consists of over 575,000 wordforms, 1,000 abbreviations, and
25,000 proper nouns. Each headword lemma is provided with a full listing
of its possible syntactic forms and spelling variants, along with
information on their relationship to the headword form. In addition, a
representation of the IPA pronunciation is given for every form. There
is also information on domains in which the headwords are used, e.g.
Computing, Engineering, Zoology. The lexicon is provided in SGML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=839&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
L0064 : L0064 Italian Source Lexicon
This source lexicon contains morphological and phonetic data for
Italian. It consists of over 115,000 headwords/lemmas and 925,000
wordforms. Each headword lemma is provided with a full listing of its
possible syntactic forms and spelling variants, along with information
on their relationship to the headword form. In addition, a
representation of the IPA pronunciation is given for every form. There
is also information on domains in which the headwords are used, e.g.
Computing, Engineering, Zoology. The lexicon is provided in SGML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=840&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
T0368 : Multilingual Wordbank
The Multilingual Wordbank consists of word translation glossaries
designed for the travel/handy-reference market. It consists of 17,500
core terms from English into French, German, Italian, Spanish, and
Portuguese, plus full coverage of local variations in American English,
Latin American Spanish, and Brazilian Portuguese. Every word is given a
frequency ranking, which can be used as a guide to user levels. In
addition, all translations in the Wordbank are provided along with
appropriate part of speech and gender information. It is provided in
tab-delimited text.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=841&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
T0369 : Multilingual Phrasebank
The Phrasebank consists of 3,000 base phrases per language organized
under 9 different topics, many of which are further subdivided. It is
presented in a compressed format, with substitutable elements bracketed,
and one or several alternatives included within the entry, reducing
storage space wasted due to repetition of common material. The
compression is extended further by reference to "template" sets of
common terms, e.g. Days of the Week, Parts of the Body, allowing a base
phrase to be combined with up to 100 different terms. 9 languages
covered (incl regional variants): UK English, US English, French,
German, Italian, European Spanish, Latin American Spanish, European
Portuguese, Brazilian Portuguese. It is provided in tab-delimited text
for phrases and Excel spreadsheets for template lists.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=842&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
T0370 : Dictionary of Law
Over 4,000 entries define and explain the major terms, concepts,
processes, and the organization of the English legal system. It features
authoritative and up-to-date articles which have been written by
practising and academic lawyers. New entries cover the Woolf reforms,
human rights law, as well as family law, central and local government,
and international law. The dictionary is provided in XML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=843&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
T0371 : Dictionary of Medecine
Over 10,000 clear and concise entries cover all major medical and
surgical specialities. The dictionary reflects recent developments in
the medical field, covering new drugs in clinical use, as well as new
advances in genetics, infertility treatment, cancer, organ
transplantation, and BSE. The dictionary is provided in XML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=844&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
S0209 : Oxford English phonetics files
Derived from a range of Oxford Dictionaries, these files list word forms
together with a representation of their IPA pronunciation. It contains
250,000 words. Pronunciation is based on standard British English. Word
forms include dictionary lemmas and inflections or other morphological
variations, plus a wide range of proper name and encyclopedic material.
The data also includes a large number of common multi-word phrases and
compound nouns. The files are provided in XML.
For more information, see
http://catalog.elda.org:8080/product_info.php?cPath=37_41&products_id=845&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
S0210 : Shorter Oxford English Dictionary - Audio Files
These are recorded headwords for the Shorter Oxford English Dictionary.
British English pronunciation. It consists of over 95,000 soundfiles.
The files are provided in 11kHz 8-bit WAV.
For more information, see
http://catalog.elda.org:8080/product_info.php?products_id=846&osCsid=f929035bd1601c2221f5beeb5144689c
W0041 : Corpus of Contemporaneous Spanish Novels
This corpus consists of 11 novels written in Castilian Spanish by
Inmaculada Ferrer-Vidal Turull, a contemporaneous author.
For more information, see
http://catalog.elda.org:8080/product_info.php?products_id=847&osCsid=f929035bd1601c2221f5beeb5144689c
For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org
More information about the Corpora
mailing list