<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <small>Our apologies if you have received multiple copies of this

      announcement. <br>

      <br>

      *****************************************************************

    </small> <small><br>

      ELRA - Language Resources Catalogue - Update <br>

      *****************************************************************

      <br>

      <font face="Times New Roman, Times, serif"><big><small><br>

          </small></big></font>ELRA is happy to announce that 1 new

      Monolingual Lexicon, 3 new Speech Resources and 3 new Evaluation

      Packages are now available in its catalogue.</small> <small><br>

      Moreover, updated versions of the ESTER Corpus, ESTER Evaluation

      Package and Bulgarian WordNet have also been released.  <br>

      <br>

      <b>1) New Language Resources:<br>

        <br>

        ELRA-L0088 Arabic Morphological Dictionary<br>

      </b>The Arabic Morphological Dictionary contains 7,912,551

      entries, including 6,247,291 nouns, 1,537,499 verbs, 127,563

      adjectives, 198 grammatical words. All files are provided as plain

      text in UTF8 character encoding, which represents about 154 Mb of

      data.</small> <big><big><big><big><span class="apple-style-span"><span

                style="font-size: 8pt; color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

        href="http://catalog.elra.info/product_info.php?products_id=1163">http://catalog.elra.info/product_info.php?products_id=1163</a><br>

      <br>

      <b>ELRA-S0338 ESTER 2 Corpus<br>

      </b>ESTER 2 Corpus, produced within the ESTER 2 evaluation

      campaign, consists of a manually transcribed radio broadcast news

      corpus amounting about 100 hours and quick transcriptions of

      African radios amounting about 6 hours. An annotation of named

      entities is provided within the development data (about 6 hours).

    </small> <big><big><big><big><span class="apple-style-span"><span

                style="font-size: 8pt; color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

href="http://catalog.elra.info/product_info.php?cPath=37_46&products_id=1167">http://catalog.elra.info/product_info.php?cPath=37_46&products_id=1167</a><br>

      <br>

      <b>ELRA-S0339 Acoustic database for Polish unit selection speech

        synthesis<br>

      </b>This database contains parliamentary statements and newspaper

      reviews read by a semi-professional male speaker. It consists of a

      selection of 2150 sentences annotated and manually verified,

      including 100 rare phonemes in words. The total duration of the

      recordings is 3.45 hours. The database is phonetically annotated

      and manually corrected, which represents a lexicon of 11761 words

      with phonetic transcription. </small> <big><big><big><big><span

              class="apple-style-span"><span style="font-size: 8pt;

                color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

href="http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1164">http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1164</a><br>

      <br>

      <b>ELRA-S0342 Acoustic database for Polish concatenative speech

        synthesis<br>

      </b>This database consists of 1443 nonsense words including all

      the diphones for the Polish language. The database includes

      information such as: the name of the diphone, context of the

      diphone, phonetic transcription in SAMPA, identifier of the wave

      file where it is placed, and three numbers: the beginning, the

      middle and the end of the diphone. </small> <big><big><big><big><span

              class="apple-style-span"><span style="font-size: 8pt;

                color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

href="http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1168">http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1168</a><br>

      <b><br>

        ELRA-E0035 DEFT'08 Evaluation Package<br>

      </b>DEFT (DEfi Fouille de Texte – Text Mining Challenge) organizes

      evaluation campaigns in the field of text mining. The topic of

      DEFT 2008 edition is related to the classification of texts by

      topics and genres. DEFT’08 Evaluation Package enables to compare

      two corpora with different genres (a newspaper article corpus

      extracted from Le Monde newspaper and a corpus of encyclopaedic

      articles extracted from the internet free encyclopaedia,

      Wikipedia) on the basis of the same set of pre-defined categories.

    </small> <big><big><big><big><span class="apple-style-span"><span

                style="font-size: 8pt; color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

        href="http://catalog.elra.info/product_info.php?products_id=1165">http://catalog.elra.info/product_info.php?products_id=1165</a><br>

      <br>

      <b>ELRA-E0039 CLEF QAST (2007-2009) – Evaluation Package<br>

      </b>The CLEF QAST (2007-2009) contains the data used for the

      Question Answering on Speech Transcripts tracks of the CLEF

      campaigns carried out from 2007 to 2009. These tracks tested the

      performance of monolingual Question Answering systems on

      collections of audio transcriptions.</small> <big><big><big><big><span

              class="apple-style-span"><span style="font-size: 8pt;

                color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

        href="http://catalog.elra.info/product_info.php?products_id=1162">http://catalog.elra.info/product_info.php?products_id=1162</a><br>

      <br>

      <b>ELRA-E0040 MEDAR Evaluation Package<br>

      </b>The MEDAR Evaluation Package was produced within the project

      MEDAR (MEDiterranean ARabic language and speech technology),

      supported by the European Commission's ICT programme. It aims to

      enable the evaluation of SLT /MT (Machine Translation) systems for

      translation tasks applying to the English-to-Arabic direction. </small>

    <big><big><big><big><span class="apple-style-span"><span

                style="font-size: 8pt; color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

href="http://catalog.elra.info/product_info.php?cPath=42_43&products_id=1166">http://catalog.elra.info/product_info.php?cPath=42_43&products_id=1166</a><br>

      <br>

      <b>2) Updated Language Resources:</b></small> <small><br>

       <br>

      <b>ELRA-S0241 ESTER Corpus<br>

      </b><i>This new release contains 100 hours of orthographically

        transcribed news broadcast (instead of 60 hours for the previous

        release).</i></small> <small><br>

      The ESTER Corpus is a subset of the ESTER Evaluation Package

      (catalogue ref. ELRA-E0021), which was produced within the French

      national project ESTER (Evaluation of Broadcast News enriched

      transcription systems), as part of the Technolangue programme

      funded by the French Ministry of Research and New Technologies

      (MRNT). The ESTER project enabled to carry out a campaign for the

      evaluation of Broadcast News enriched transcription systems for

      French.<br>

    </small> <big><big><big><big><span class="apple-style-span"><span

                style="font-size: 8pt; color: black;" lang="EN-GB"></span></span></big></big></big></big><small>For

      more information, see: <a

        href="http://catalog.elra.info/product_info.php?products_id=999">http://catalog.elra.info/product_info.php?products_id=999</a><br>

      <br>

      <b>ELRA-E0021 ESTER Evaluation Package<br>

      </b><i>This new release contains 100 hours of orthographically

        transcribed news broadcast (instead of 60 hours for the previous

        release).<br>

      </i>The ESTER Evaluation Package was produced within the French

      national project ESTER (Evaluation of Broadcast News enriched

      transcription systems), as part of the Technolangue programme

      funded by the French Ministry of Research and New Technologies

      (MRNT). The ESTER project enabled to carry out a campaign for the

      evaluation of Broadcast News enriched transcription systems for

      French. </small> <small><br>

      This package includes the material that was used for the ESTER

      evaluation campaign. It includes resources, protocols, scoring

      tools, results of the campaign, etc., that were used or produced

      during the campaign. The aim of these evaluation packages is to

      enable external players to evaluate their own system and compare

      their results with those obtained during the campaign itself. <br>

      The campaign is distributed over three actions: orthographic

      transcription, segmentation and information extraction (named

      entity tracking).</small><big><big><big><big><span

              class="apple-style-span"><span style="font-size: 8pt;

                color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

        href="http://catalog.elra.info/product_info.php?products_id=995">http://catalog.elra.info/product_info.php?products_id=995</a><br>

      <br>

      <b>ELRA-M0041 Bulgarian WordNet<br>

      </b><i>This new release contains </i></small> <small>38209

      synsets<i> (instead of 23715 synsets for the previous release).<br>

      </i>The Bulgarian WordNet is a network of lexical-semantic

      relations, an electronic thesaurus with a structure modelled on

      that of the Princeton WordNet and those constructed in the

      EuroWordNet and BalkaNet project. Bulgarian WordNet describes

      meaning of a lexical unit by placing it within a network of

      semantic relations, such as hypernyny, meronymy, antonymy etc. It

      contains 38209 synsets, 83493 literals, 89242 relations (including

      58095 semantic relations, 4172 extralinguistic relations). </small><big><big><big><big><span

              class="apple-style-span"><span style="font-size: 8pt;

                color: black;" lang="EN-GB"><big><big><br>

                  </big></big></span></span></big></big></big></big><small>For

      more information, see: <a

href="http://catalog.elra.info/product_info.php?cPath=42_45&products_id=802">http://catalog.elra.info/product_info.php?cPath=42_45&products_id=802</a><br>

      <br>

      <br>

      For more information on the catalogue, please contact Valérie

      Mapelli </small> <small><a moz-do-not-send="true"

        class="moz-txt-link-freetext" href="mailto:mapelli@elda.org">mailto:mapelli@elda.org</a>

      <br>

      <br>

      Visit our On-line Catalogue: </small> <small><a

        moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://catalog.elra.info">http://catalog.elra.info</a><br>

      Visit the Universal Catalogue: <a moz-do-not-send="true"

        href="http://universal.elra.info">http://universal.elra.info</a>

      <br>

      Archives of ELRA Language Resources Catalogue Updates: <a

        moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://www.elra.info/LRs-Announcements.html">http://www.elra.info/LRs-Announcements.html</a></small>

  </body>

</html>