<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<div align="center">The Linguistic Data
Consortium (LDC) would like to
announce the availability of
three new publications.<br>
<br>
LDC2007S02<br>
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S02"><b>Fisher
Levantine Arabic Conversational Telephone Speech</b></a><br>
<br>
LDC2007T04<br>
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T04"><b>Fisher
Levantine Arabic Conversational Telephone Speech, Transcripts</b></a><br>
<br>
LDC2007V01<br>
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007V01"><b>TRECVID
2005 Keyframes & Transcripts</b></a><br>
</div>
<br>
<hr size="2" width="100%">
<br>
(1) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S02">Fisher
Levantine Arabic Conversational Telephone Speech</a> contains 279
conversations totaling 45 hours of speech. Levantine Arabic is spoken
along the western Mediterranean coast from Anatolia to the Sinai
Peninsula and encompasses the local dialects of Lebanon, Syria and
Palestine. There are two distinct varieties: Northern, centered around
Syria and Lebanon; and Southern, spoken in Jordan and Palestine. The
majority of speakers in Fisher Levantine Arabic Conversational
Telephone Speech are from Jordan, Lebanon, and Palestine.<br>
<br>
The conversations in this corpus are a subset of the conversations in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29">Levantine
Arabic QT Training Data Set 5, Speech</a>, LDC2006S29. The individual
audio files are in NIST SPHERE format. The corresponding transcripts
may
be found in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T04">Fisher
Levantine Arabic Conversational Telephone Speech, Transcripts</a>,
LDC2007T04. <br>
<div align="center">*<br>
</div>
<br>
(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T04">Fisher
Levantine Arabic Conversational Telephone Speech, Transcripts</a>
contains the transcripts for the 279 telephone conversations in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S02">Fisher
Levantine Arabic Conversational Telephone Speech </a>, LDC2007S02.
The transcripts were created with "green" and "yellow" layers using
LDC's Multi-Dialectal Transcription Tool (AMADAT). The green layer
seeks to anchor dialectal forms to similar or related Modern Standard
Arabic orothgraphy-based forms. The yellow layer is a more careful and
detailed transcription that adds functionally necessary vowels and
marks important sociolinguistic variations and morphophonemic features.
<br>
<br>
The green layer transcripts in this corpus are a subset of the
transcripts contained in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07">Levantine
Arabic QT Training Data Set 5, Transcripts</a>, LDC2006T07. The yellow
layer transcription was added in this release. <br>
<br>
<div align="center">*<br>
</div>
<br>
(3) TREC Video Retrieval Evaluation (TRECVID) is sponsored by the
National Institute of Standards and Technology (NIST) to promote
progress in content-based retrieval from digital video via open,
metrics-based evaluation. The keyframes in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007V01">TRECVID
2005 Keyframes & Transcripts</a> were extracted for use in the NIST
TRECVID 2005 Evaluation. The source data used were Arabic, Chinese
and English language broadcast programming collected in November 2004.<br>
<br>
TRECVID is a laboratory-style evaluation that attempts to model real
world situations or significant component tasks involved in such
situations. In 2005 there were four main tasks with associated tests: <br>
<br>
<ul>
<li>shot boundary determination </li>
</ul>
<ul>
<li>low-level feature extraction </li>
</ul>
<ul>
<li>high-level feature extraction </li>
</ul>
<ul>
<li>search (interactive, manual, and automatic) </li>
</ul>
<br>
Shots are fundamental units of video, useful for higher-level
processing. To create the master list of shots, the video was
segmented. The results of this pass are called subshots. Because the
master shot reference is designed for use in manual assessment, a
second pass over the segmentation was made to create the master shots
of at least 2 seconds in length. These master shots are the ones used
in submitting results for the feature and search tasks in the
evaluation. In the second pass, starting at the beginning of each file,
the subshots were aggregated, if necessary, until the current shot was
at least 2 seconds in duration, at which point the aggregation began
anew with the next subshot. <br>
<br>
The keyframes were selected by going to the middle frame of the shot
boundary, then parsing left and right of that frame to locate the
nearest I-Frame. This then became the keyframe and was extracted.
Keyframes have been provided at both the subshot (NRKF) and master shot
(RKF) levels. <br>
<br>
<hr size="2" width="100%"><br>
<div align="center"><small><font face="Courier New, Courier, monospace">Ilya
Ahtaridis<br>
Membership Coordinator</font></small><br>
--------------------------------------------------------------------
<font face="Courier New, Courier, monospace"><br>
</font></div>
<div align="center">
<pre class="moz-signature" cols="72"><b><small><font
face="Courier New, Courier, monospace">
</font></small>Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></b></pre>
</div>
</body>
</html>