<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p class="MsoNormal"><i>New publications:</i><br>
</p>
<p class="MsoNormal"> <a href="#speech"><b>CALLFRIEND Farsi Second
Edition Speech</b></a><b><br>
</b></p>
<p class="MsoNormal"> <a href="#trans"><b>CALLFRIEND Farsi Second
Edition Transcripts</b></a></p>
<hr size="2" width="100%"><b>New Publications</b><o:p></o:p>
<p class="MsoNormal"> <a name="speech"></a>(1) <a
href="http://catalog.ldc.upenn.edu/LDC2014S01">CALLFRIEND Farsi
Second Edition Speech</a> was developed by LDC and consists of
approximately 42 hours of telephone conversation (100 recordings)
among native Farsi speakers. The calls were recorded in 1995 and
1996 as part of the CALLFRIEND collection, a project designed
primarily to support research in automatic language
identification. One hundred native Farsi speakers living in the
continental United States each made a single telephone call,
lasting up to 30 minutes, to a family member or friend living in
the United States.<o:p></o:p></p>
<p class="MsoNormal">This release represents all calls from the
collection. LDC released recordings from 60 calls without
transcripts in 1996 as CALLFRIEND Farsi (<a
href="http://catalog.ldc.upenn.edu/LDC96S50">LDC96S50</a>) after
20 of those calls were used as evaluation data in the first <a
href="http://www.itl.nist.gov/iad/mig/tests/lre/1996/">NIST
Language Recognition Evaluation</a> (LRE).<o:p></o:p></p>
<p class="MsoNormal">Corresponding transcripts are available in
CALLFRIEND Farsi Second Edition Transcripts (<a
href="http://catalog.ldc.upenn.edu/LDC2014T01">LDC2014T01</a>).<o:p></o:p></p>
<p class="MsoNormal">All recordings involved domestic calls routed
through LDC’s automated telephone collection platform and were
stored as 2-channel (4-wire), 8-KHz mu-law samples taken directly
from the public telephone network via a T-1 circuit. Each audio
file is a <a href="https://xiph.org/flac/">FLAC</a>-compressed
MS-WAV (RIFF) format audio file containing 2-channel, 8-KHz,
16-bit PCM sample data.<o:p></o:p></p>
<p class="MsoNormal">This release includes speaker information,
including gender, the number of speakers on each channel and call
duration.<o:p></o:p></p>
<p class="MsoNormal" align="center">*<o:p></o:p></p>
<p class="MsoNormal"> <a name="trans"></a>(2) <a
href="http://catalog.ldc.upenn.edu/LDC2014T01">CALLFRIEND Farsi
Second Edition Transcripts</a> was developed by LDC and consists
of transcripts for approximately 42 hours of telephone
conversation (100 recordings) among native Farsi speakers. The
calls were recorded in 1995 and 1996 as part of the CALLFRIEND
collection, a project designed primarily to support research in
automatic language identification. One hundred native Farsi
speakers living in the continental United States made a single
telephone call, lasting up to 30 minutes, to a family member or
friend living in the United States.<o:p></o:p></p>
<p class="MsoNormal">Corresponding speech data is available as
CALLFRIEND Farsi Second Edition Speech (<a
href="http://catalog.ldc.upenn.edu/LDC2014S01">LDC2014S01</a>).<o:p></o:p></p>
<p class="MsoNormal">Transcripts are presented in three formats:
romanized transcripts (*asc.txt), Arabic-script transcripts
(*ntv.txt) and both romanized and Arabic forms in a simple XML
format (*.xml). For the *.txt files, the four main fields on each
line (start-offset, end-offset, speaker-label, transcript-text)
are separated by tabs. Each file begins with a single comment line
containing the file_id string. This is followed immediately by the
list of time-stamped segments, in order according to their
start-offset values, with no blank lines. The XML form of the
transcripts contains both Arabicized and romanized forms for Farsi
words.<o:p></o:p></p>
<br>
<hr size="2" width="100%">
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</body>
</html>