[Corpora-List] ELRA - Language Resources Catalogue - Update

ELDA info at elda.org
Thu Mar 16 11:26:08 UTC 2006


Our apologies if you have received multiple copies of this announcement
 
*******************************************************************
ELRA - Language Resources Catalogue - Update

*******************************************************************

We are happy to announce that new Text and Speech Language Resources are 
now available in our catalogue.
To view all the Language Resources available, you can visit our on-line 
catalogue :  http://catalog.elda.org/index.php?language=en

L0058: British English Source Lexicon (BESL) version 2.2
BESL consists of over 230,000 lemmas, over 350,000 word forms, 60,000 
proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns. 
Each headword is provided with a full listing of all inflected forms and 
other morphological variation. Every word form is marked for part of 
speech (using Penn TreeBank notation). Most single-word forms include a 
representation of IPA pronunciation. BESL covers both British and 
American English, and other spelling variants, with cross-references 
between corresponding forms. BESL is provided in XML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=834&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0059: Offensive Word Filter 1
This list features 4500 words and expressions for UK and US English 
usage with a grading system describing vocabulary type and offensive 
strength for each term, plus collocational information to help identify 
the terms in context. The list is provided in tab-delimited ASCII.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=835&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0060: Offensive Word Filter 2
This list features 2000 words and expressions, classified into 13 
categories, for UK and US English usage with a grading system describing 
vocabulary type and offensive strength for each term, plus collocational 
information to help identify the terms in context. The list is provided 
in an Excel spreadsheet.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=836&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0061: The Oxford Spanish Dictionary
This dictionary consists of 300,000 words and phrases, 500,000 
translations, for 24 regional varieties of Spanish. It includes 
thousands of real, authentic example sentences carefully selected to 
illustrate the full range of meanings and typical contexts. The 
dictionary is provided in XML or SGML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=837&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0062 : L0062 French Source Lexicon
This source lexicon contains morphological and phonetic data for French. 
It consists of over 90,000 headwords/lemmas, 400,000 wordforms, 1,000 
abbreviations, and 35,000 proper nouns. Each headword lemma is provided 
with a full listing of its possible syntactic forms and spelling 
variants, along with information on their relationship to the headword 
form. In addition, a representation of the IPA pronunciation is given 
for every form. There is also information on domains in which the 
headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is 
provided in SGML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=838&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0063 : L0063 Spanish Source Lexicon
This source lexicon contains morphological and phonetic data for 
Spanish. It consists of over 575,000 wordforms, 1,000 abbreviations, and 
25,000 proper nouns. Each headword lemma is provided with a full listing 
of its possible syntactic forms and spelling variants, along with 
information on their relationship to the headword form. In addition, a 
representation of the IPA pronunciation is given for every form. There 
is also information on domains in which the headwords are used, e.g. 
Computing, Engineering, Zoology. The lexicon is provided in SGML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=839&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

L0064 : L0064 Italian Source Lexicon
This source lexicon contains morphological and phonetic data for 
Italian. It consists of over 115,000 headwords/lemmas and 925,000 
wordforms. Each headword lemma is provided with a full listing of its 
possible syntactic forms and spelling variants, along with information 
on their relationship to the headword form. In addition, a 
representation of the IPA pronunciation is given for every form. There 
is also information on domains in which the headwords are used, e.g. 
Computing, Engineering, Zoology. The lexicon is provided in SGML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=42_44&products_id=840&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

T0368 : Multilingual Wordbank
The Multilingual Wordbank consists of word translation glossaries 
designed for the travel/handy-reference market. It consists of 17,500 
core terms from English into French, German, Italian, Spanish, and 
Portuguese, plus full coverage of local variations in American English, 
Latin American Spanish, and Brazilian Portuguese. Every word is given a 
frequency ranking, which can be used as a guide to user levels. In 
addition, all translations in the Wordbank are provided along with 
appropriate part of speech and gender information. It is provided in 
tab-delimited text.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=841&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
 
T0369 : Multilingual Phrasebank
The Phrasebank consists of 3,000 base phrases per language organized 
under 9 different topics, many of which are further subdivided. It is 
presented in a compressed format, with substitutable elements bracketed, 
and one or several alternatives included within the entry, reducing 
storage space wasted due to repetition of common material. The 
compression is extended further by reference to "template" sets of 
common terms, e.g. Days of the Week, Parts of the Body, allowing a base 
phrase to be combined with up to 100 different terms. 9 languages 
covered (incl regional variants): UK English, US English, French, 
German, Italian, European Spanish, Latin American Spanish, European 
Portuguese, Brazilian Portuguese. It is provided in tab-delimited text 
for phrases and Excel spreadsheets for template lists.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=842&osCsid=7baa5c04a4e133faf8f27719b14eb3bf
   
T0370 : Dictionary of Law
Over 4,000 entries define and explain the major terms, concepts, 
processes, and the organization of the English legal system. It features 
authoritative and up-to-date articles which have been written by 
practising and academic lawyers. New entries cover the Woolf reforms, 
human rights law, as well as family law, central and local government, 
and international law. The dictionary is provided in XML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=843&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

T0371 : Dictionary of Medecine
Over 10,000 clear and concise entries cover all major medical and 
surgical specialities. The dictionary reflects recent developments in 
the medical field, covering new drugs in clinical use, as well as new 
advances in genetics, infertility treatment, cancer, organ 
transplantation, and BSE. The dictionary is provided in XML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=24&products_id=844&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

S0209 : Oxford English phonetics files
Derived from a range of Oxford Dictionaries, these files list word forms 
together with a representation of their IPA pronunciation. It contains 
250,000 words. Pronunciation is based on standard British English. Word 
forms include dictionary lemmas and inflections or other morphological 
variations, plus a wide range of proper name and encyclopedic material. 
The data also includes a large number of common multi-word phrases and 
compound nouns. The files are provided in XML.
For more information, see 
http://catalog.elda.org:8080/product_info.php?cPath=37_41&products_id=845&osCsid=7baa5c04a4e133faf8f27719b14eb3bf

S0210 : Shorter Oxford English Dictionary - Audio Files
These are recorded headwords for the Shorter Oxford English Dictionary. 
British English pronunciation. It consists of over 95,000 soundfiles. 
The files are provided in 11kHz 8-bit WAV.
For more information, see 
http://catalog.elda.org:8080/product_info.php?products_id=846&osCsid=f929035bd1601c2221f5beeb5144689c

W0041 : Corpus of Contemporaneous Spanish Novels
This corpus consists of 11 novels written in Castilian Spanish by 
Inmaculada Ferrer-Vidal Turull, a contemporaneous author.
For more information, see 
http://catalog.elda.org:8080/product_info.php?products_id=847&osCsid=f929035bd1601c2221f5beeb5144689c


For more information on the catalogue, please contact Valérie Mapelli 
mailto:mapelli at elda.org



More information about the Corpora mailing list