<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<div align="center">LDC2006S16<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S16">CSLU
Spoltech Brazilian Portuguese Version 1.0</a></b><br>
<br>
LDC2006T09<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T09">Korean
Treebank Annotations Version 2.0</a></b><br>
<br>
LDC2006S13<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S13">N4
NATO Native and Non-Native Speech</a></b><br>
<br>
LDC2006T08<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08">Timebank
1.2</a></b><br>
<br>
<br>
The Linguistic Data Consortium (LDC) is pleased to announce the
availability of four new publications.<br>
</div>
<br>
<hr size="2" width="100%">
<br>
<div align="center"><b>New LDC Publications<br>
<br>
</b></div>
<p>(1) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S16">CSLU
Spoltech Brazilian Portuguese</a> corpus contains microphone
speech from a variety of regions in Brazil with phonetic and
orthographic transcriptions. The utterances consist of both read speech
(for phonetic coverage) and responses to questions (for spontaneous
speech). The corpus contains 477 speakers and 8080 separate utterances.
A total of 2540 utterances have been transcribed at the word level
(without time alignments), and 5479 utterances have been transcribed at
the phoneme level (with time alignments). <br>
</p>
<p>The data have been recorded at 44.1 kHz (mono, 16 bit) and stored in
RIFF format. The recording was conducted with a direct connection from
the microphone to the sound card. The sound card was
SoundBlaster-compatible. For the prompted sentences, the sentence was
hidden from view when recording began, so that the speaker might utter
the sentence more naturally. Verification of the recording quality was
performed immediately after each utterance recording; the
data-collection software allowed the speaker to re-record utterances in
case the recording was not of sufficient quality. The acoustic
environment was not controlled, in order to allow for background
conditions that would occur in application environments. </p>
<br>
<div align="center">*<br>
</div>
<b><br>
</b>(2) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T09">Korean
Treebank Annotations Version 2.0</a> is an extension of the
Korean English Treebank Annotations corpus, LDC2002T26 (2002). It is
essentially an electronic corpus of Korean texts annotated with
morphological and syntactic information. The original texts for the
Korean Treebank 2.0 were selected from The Korean Newswire corpus
published by LDC, catalog number LDC2000T45, which is a collection of
Korean Press Agency news articles from June 2, 1994 to March 20, 2000.
Korean Treebank 2.0 is based on the March 2000 portion of the corpus
and includes 647 articles. The annotated corpus can find many uses,
including training of morphological analyzers, part-of-speech taggers
and syntactic parsers. <br>
<br>
<div align="center">*<br>
</div>
<p>(3) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S13">N4
NATO Native and Non-Native Speech</a> corpus was developed by the
NATO research group on Speech and Language Technology in order to
provide a military oriented database for multilingual and non-native
speech processing studies. The NATO Speech and Language Technology
group decided to create a corpus geared towards the study of non-native
accents. The group chose naval communications as the common task
because it naturally includes a great deal of non-native speech and
because there were training facilities where data could be collected in
several countries. </p>
<p>Speech data was recorded in the Naval transmission training centers
of four countries (Germany, The Netherlands, United Kingdom, and
Canada). The material consists of native and non-native speakers
speakers using NATO English procedure between ships and reading from a
text, "The North Wind and the Sun" in both English and the speaker's
native language. </p>
<div align="center">*<br>
</div>
<p>(4) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08">TimeBank
1.2</a> corpus contains 183 news articles that have been
annotated with temporal information, adding events, times and temporal
links between events and times. The annotation follows the TimeML 1.2.1
specification. The most recent
information on TimeML is always available at <a
href="http://www.timeml.org">www.timeml.org</a>. </p>
<p>TimeML aims to capture and represent temporal information. This is
accomplished using four primary tag types: TIMEX3 for temporal
expressions, EVENT for temporal events, SIGNAL for temporal signals,
and LINK for representing relationships. Timebank 1.2 is distributed
via web download.<br>
</p>
<p>Nonmembers may
also license this data at <b>no
cost</b> - please note that a signed copy of our <a
href="http://www.ldc.upenn.edu/Catalog/nonmem_agree/generic.license.html">generic
nonmember user
agreement</a> is required.<br>
</p>
<br>
<hr size="2" width="100%">
<div align="center"><font face="Courier New"><small><big><font
face="Times New Roman"><br>
If
you need further
information, or would like to inquire about
membership to the LDC, please email <a class="moz-txt-link-abbreviated"
href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215
573 1275.</font></big></small></font><br>
</div>
<p><font face="Courier New"><small><br>
<br>
</small></font>
</p>
<div align="center">--------------------------------------------------------------------<br>
</div>
<div align="center">
<pre class="moz-signature" cols="72">Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
<p><br>
</p>
</body>
</html>