8.770, FYI: Exercises, Lg Resources, Workshop

linguist at linguistlist.org linguist at linguistlist.org
Fri May 23 13:24:51 UTC 1997


LINGUIST List:  Vol-8-770. Fri May 23 1997. ISSN: 1068-4875.

Subject: 8.770, FYI: Exercises, Lg Resources, Workshop

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: T. Daniel Seely <seely at linguistlist.org>

=================================Directory=================================

1)
Date:  Tue, 20 May 1997 10:53:38 -0700
From:  Marmo Soemarmo <soemarmo at oak.cats.ohiou.edu>
Subject:  Exercises on the Web

2)
Date:  Tue, 20 May 1997 20:06:14 +0200 (MET DST)
From:  elra at calvanet.calvacom.fr (Khalid Choukri)
Subject:  ELRA New Language Resources

3)
Date:  Wed, 21 May 1997 10:03:58 +0200 (MDT)
From:  Helmer Strik <strik at let.kun.NL>
Subject:  modeling pronunciation variation for ASR

-------------------------------- Message 1 -------------------------------

Date:  Tue, 20 May 1997 10:53:38 -0700
From:  Marmo Soemarmo <soemarmo at oak.cats.ohiou.edu>
Subject:  Exercises on the Web

I put a sample of my exercises on the web. Check them out at:

	http://www.cats.ohiou.edu/~linguist/lexicon/lexicon.htm

In case you haven't checked out my Language Games, it's at:

	http://ouvaxa.cats.ohiou.edu/~soemarmo/games/menu.htm

Marmo



-------------------------------- Message 2 -------------------------------

Date:  Tue, 20 May 1997 20:06:14 +0200 (MET DST)
From:  elra at calvanet.calvacom.fr (Khalid Choukri)
Subject:  ELRA New Language Resources

[ We apologise for the duplicate posting of this announcement ]


                  EUROPEAN LANGUAGE RESOURCES ASSOCIATION (ELRA)


                     ***  NEW CATALOGUE & NEW RESOURCES  ***





The new release of ELRA catalogue (vol2N1) has grown up and currently
 consists of:

1) Spoken resources: 37 databases in several languages (recordings from
 microphone, telephone, continuous speech, isolated words, phonetic
 distionaries, etc.).

2) Written resources:
     * 14 monolingual and multilingual corpora
     * 28 monolingual lexica
     * Around 60 multilingual lexica
     * A linguistic software platform and grammars development platform

3) Terminological resources: over 360 databases with a wide range of domains
and several languages (Catalan, Danish, English, French, German, Italian,
Latin, Polish, Portuguese, Spanish, Turkish).

Since our last news on this electronic list, new resources have been
negotiated by ELRA and are now available. These are:

SPEECH AND RELATED RESOURCES


                        ELRA-S0035 Phonolex (BAS/DFKI):

PHONOLEX consists of a simple list of word forms (666,237 inflected words)
with a set of features e.g. orthography (German 'Umlauts' in LaTeX format,
capital nouns, old German spelling rules), linguistic information (nouns,
verbs, etc.), pronunciation and a list of empirical pronunciations.

Language: German
Format:   ASCII
Mark-up:  extended SAM-PA (PhonDat-Verbmobil)

- --------------------------------------------------------------------------


            ELRA-S0036 Speri-Data AG Basic dictionaries (colloquial
 language):

These dictionaries contain a daily-life vocabulary. They include phonetic
transcriptions with related phoneme lists. The following languages are
available:

Language        Entries
Danish           8,000
Dutch           12,000
English (UK)     8,000
Finnish         10,000
French          19,000
German          13,000
Italian         23,000
Norwegian        8,000
Portuguese       9,000
Spanish         13,000
Swedish         10,000

- --------------------------------------------------------------------------

                   ELRA-S0037 Speri-Data AG Technical dictionaries:

All dictionaries contain phonetic transcriptions, with related phoneme
lists. The following dictionaries are available (the label basic dictionary
refers to the above ELRA-S0036):

Domain                        Entries
Banking French                 10,200
Banking German                 10,200
Banking Italian                10,200
Banking Spanish                10,200
Radiology German               42,000 (including basic dictionary)
Radiology English              16,000
Medical German                130,000 (including basic dictionary)
Jurisprudence German           31,000
Jurisprudence German           55,000 (including basic dictionary)
Insurance German & English     37,000


A peculiarity of medical dictionaries in German speaking countries has to be
taken into consideration: doctors in Germany, Austria and Switzerland may
not use the original technical terms in Latin but the Latin word in a
spelled manner or a German technical term (see examples below). Medical
dictionaries therefore have to contain three different terms.

Technical term    Technical term         Technical term
in Latin          in German spelling     in German

Appendicitis      Appendizitis           BlinddarmentzFCndung
Eccema	          Eczema                 Ekzem
Diarrhoe          DiarrhF6 or DiarrhF6e    Durchfall, Durchfluss
Carbunculus       Karbunkel              GeschwFCr

- --------------------------------------------------------------------------


                ELRA-S0038 Siemens VoiceMail (American English)

VoiceMail consists of 17,5 hours of read acoustic speech divided into 9,5
hours of transliterated speech and 8 hours of non-transliterated speech
recorded over the digital telephone network (ISDN) with 921 speakers
originated from the USA. It contains orthographic transliteration for about
25,000 utterrances (of 34,912 utterances in total).

Language: American English
Standard in use: headerless, one separate transliteration file comprising
all utterances of all speakers
Sampling rate: 8 kHz
Speakers: 377 males and 544 females
Size: 17,5 hours
Medium: 2 CD-ROM


WRITTEN RESOURCES - MONOLINGUAL LEXICA


               ELRA-L0021 Dictionary of French verbs - CORA:

This dictionary contains 25,610 verbs with usage domains, level of language
(familiar, popular, literary, Quebec and Swiss terms, etc.), conjugation,
auxiliary, verbal adjectives in -able, -ant or -E9, encoded syntactical
constructions (subject, direct & indirect object, adverb), sample phrases,
synonyms, operators enabling semantic-syntactic classification, encoding of
derived forms in -age, -ment, -tion, -oir, -ure, deverbal nouns, base words
from which verbs can be derived, a scale of usage ranging from 1 to 6, like
those used by commercial dictionaries (basic vocabulary, extended,
specialised, etc.).
Codes enable automatic production of conjugation forms, derived nouns and
adjectives and, if necessary, the production of potential forms.

- --------------------------------------------------------------------------

                  ELRA-L0022 Dictionary of words - CORA:

This dictionary is composed of 126,844 words, with usage domains,
grammatical category, gender, number, uncountable, collective, adjectival,
nominal, verbal, adverbial derived forms according to the type of words.

- --------------------------------------------------------------------------

                  ELRA-L0023 Dictionary of affixes - CORA:

4,286 suffixes and prefixes, plus information on their verbal, nominal or
adjectival bases or on the verbal basis of greco-latin items. This
dictionary does not include the suffixes contained in the dictionary of
French verbs (ELRA-L0021) and words (ELRA-L0022) such as -age, -ment, -if,
 -oir.

- --------------------------------------------------------------------------

              ELRA-L0024 Dictionary of verb phrases - CORA:

Dictionary of 3,480 entries based on the model of the dictionary of French
verbs (ELRA-L0021).

- --------------------------------------------------------------------------

          ELRA-L0025 Dictionary of invariable forms and phrases - CORA:

Dictionary of 4,783 entries based on the model of the dictionary of words
(ELRA-L0022).

- --------------------------------------------------------------------------

        ELRA-L0026 Dictionary of exclamatory stereotyped phrases - CORA:

Dictionary of 1,901 entries based on the model of the dictionary of
invariable forms and phrases (ELRA-L0025).


- --------------------------------------------------------------------------

                  ELRA-L0027 Dictionary of French local authorities - CORA:

38,965 entries in lower cases with accents, controlled on the guide
Michelin, without localities; A link can be made to the dictionary of words
(ELRA-L0022) which contains inhabitants' names and their correspondence with
town names.

- --------------------------------------------------------------------------

              ELRA-L0028 Dictionary of noun phrases and plural-only words -
 CORA:

2,138 compound names and 1,397 entries of plural-only words.


For further information, please contact :

     ELRA/ELDA
     87, Avenue d'Italie
     FR-75013 PARIS
     FRANCE
     Tel : +33 01 45 86 53 00
     Fax : +33 01 45 86 44 88
     E-mail : info-elra at calva.net
     WWW: http://www.icp.grenet.fr/ELRA/home.html


....................................
Khalid CHOUKRI
ELRA /ELDA
Tel. +33 1 45 86 53 00
Fax. +33 1 45 86 44 88
87, Avenue D'ITALIE, 75013 PARIS
Email: elra at calvanet.calvacom.fr
Web:  http://www.icp.grenet.fr/ELRA/home.html
....................................


-------------------------------- Message 3 -------------------------------

Date:  Wed, 21 May 1997 10:03:58 +0200 (MDT)
From:  Helmer Strik <strik at let.kun.NL>
Subject:  modeling pronunciation variation for ASR

Below is some information on the workshop
'modeling pronunciation variation for automatic speech recognition'
that will be organized from 4-6 May 1998 in The Netherlands.
More information about the workshop is available at
http://lands.let.kun.nl/pron-var/.

Ajo,
Helmer



                             advance notice
                 ESCA Tutorial and Research Workshop on


                    MODELING PRONUNCIATION VARIATION
                    FOR AUTOMATIC SPEECH RECOGNITION


                              4-6 May 1998

          at Rolduc, a former monestary in the city of Kerkrade
                    in the south of The Netherlands


                              Organized by


                                  ESCA
               European Speech Communication Association


                        COST Telecom Action 249
            Continuous Speech Recognition over the Telephone


                                  A2RT
             'Automatic Acoustic Recognition Technologies'
                       Dept. of Language & Speech
                           Nijmegen University



TOPIC OF THE WORKSHOP

     Automatic Speech Recognizers (ASR's) have improved substantially
     during the last decade. It has now become possible to use ASR's for
     many practical applications. However, when ASR's are used (and
     tested) under realistic conditions, the problem of pronunciation
     variation almost always emerges. This problem has been recognized
     by several research groups, and more and more effort is spent
     nowadays on solving this problem (see e.g. the steadily growing
     number of publications on this topic, especially in conference
     proceedings).

     During this workshop we want to discuss this problem in depth and
     the different ways in which it could be solved. Although part of
     pronunciation variation is certainly language-dependent (i.e. the
     phonological and phonetic processes differ between languages), a
     large part of the variation is language independent. Furthermore,
     the techniques that can be used to solve this problem, i.e. to
     model pronunciation variation for ASR, are usually
     language-independent.


WWW-SITE

     Up-to-date information about the workshop is available at
     http://lands.let.kun.nl/pron-var/.


SCIENTIFIC COMMITTEE

     Elizabeth Shriberg
     Herve Bourlard
     Li Deng
     Lori Lamel
     Mari Ostendorf
     Patti Price
     Roger Moore
     Rolf Carlson
     Sadaoki Furui
     Steve Young


CONTACT PERSON

     Helmer Strik
     Dept. of Language and Speech
     P.O. Box 9103
     6500 HD Nijmegen
     The Netherlands

     Tel.nr.: 31-24-3616104
     Fax nr.: 31-24-3615939
     E-mail : Strik at let.kun.nl
     URL http://lands.let.kun.nl/TSpublic/strik

---------------------------------------------------------------------------
LINGUIST List: Vol-8-770



More information about the LINGUIST mailing list