<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p class="MsoNormal"><i>New publications:</i><br>

    </p>

    <p class="MsoNormal"> <a href="#speech"><b>CALLFRIEND Farsi Second

          Edition Speech</b></a><b><br>

      </b></p>

    <p class="MsoNormal"> <a href="#trans"><b>CALLFRIEND Farsi Second

          Edition Transcripts</b></a></p>

    <hr size="2" width="100%"><b>New Publications</b><o:p></o:p>

    <p class="MsoNormal"> <a name="speech"></a>(1) <a

        href="http://catalog.ldc.upenn.edu/LDC2014S01">CALLFRIEND Farsi

        Second Edition Speech</a> was developed by LDC and consists of

      approximately 42 hours of telephone conversation (100 recordings)

      among native Farsi speakers. The calls were recorded in 1995 and

      1996 as part of the CALLFRIEND collection, a project designed

      primarily to support research in automatic language

      identification. One hundred native Farsi speakers living in the

      continental United States each made a single telephone call,

      lasting up to 30 minutes, to a family member or friend living in

      the United States.<o:p></o:p></p>

    <p class="MsoNormal">This release represents all calls from the

      collection. LDC released recordings from 60 calls without

      transcripts in 1996 as CALLFRIEND Farsi (<a

        href="http://catalog.ldc.upenn.edu/LDC96S50">LDC96S50</a>) after

      20 of those calls were used as evaluation data in the first <a

        href="http://www.itl.nist.gov/iad/mig/tests/lre/1996/">NIST

        Language Recognition Evaluation</a> (LRE).<o:p></o:p></p>

    <p class="MsoNormal">Corresponding transcripts are available in

      CALLFRIEND Farsi Second Edition Transcripts (<a

        href="http://catalog.ldc.upenn.edu/LDC2014T01">LDC2014T01</a>).<o:p></o:p></p>

    <p class="MsoNormal">All recordings involved domestic calls routed

      through LDC’s automated telephone collection platform and were

      stored as 2-channel (4-wire), 8-KHz mu-law samples taken directly

      from the public telephone network via a T-1 circuit. Each audio

      file is a <a href="https://xiph.org/flac/">FLAC</a>-compressed

      MS-WAV (RIFF) format audio file containing 2-channel, 8-KHz,

      16-bit PCM sample data.<o:p></o:p></p>

    <p class="MsoNormal">This release includes speaker information,

      including gender, the number of speakers on each channel and call

      duration.<o:p></o:p></p>

    <p class="MsoNormal" align="center">*<o:p></o:p></p>

    <p class="MsoNormal"> <a name="trans"></a>(2) <a

        href="http://catalog.ldc.upenn.edu/LDC2014T01">CALLFRIEND Farsi

        Second Edition Transcripts</a> was developed by LDC and consists

      of transcripts for approximately 42 hours of telephone

      conversation (100 recordings) among native Farsi speakers. The

      calls were recorded in 1995 and 1996 as part of the CALLFRIEND

      collection, a project designed primarily to support research in

      automatic language identification. One hundred native Farsi

      speakers living in the continental United States made a single

      telephone call, lasting up to 30 minutes, to a family member or

      friend living in the United States.<o:p></o:p></p>

    <p class="MsoNormal">Corresponding speech data is available as

      CALLFRIEND Farsi Second Edition Speech (<a

        href="http://catalog.ldc.upenn.edu/LDC2014S01">LDC2014S01</a>).<o:p></o:p></p>

    <p class="MsoNormal">Transcripts are presented in three formats:

      romanized transcripts (*asc.txt), Arabic-script transcripts

      (*ntv.txt) and both romanized and Arabic forms in a simple XML

      format (*.xml). For the *.txt files, the four main fields on each

      line (start-offset, end-offset, speaker-label, transcript-text)

      are separated by tabs. Each file begins with a single comment line

      containing the file_id string. This is followed immediately by the

      list of time-stamped segments, in order according to their

      start-offset values, with no blank lines. The XML form of the

      transcripts contains both Arabicized and romanized forms for Farsi

      words.<o:p></o:p></p>

    <br>

    <hr size="2" width="100%">

    <pre class="moz-signature" cols="72">-- 

--


Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

  </body>

</html>