<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div align="center">

      <div align="left"><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a

            href="#scholar"><b>-  Spring 2014 LDC Data Scholarship

              Program</b></a></span>  -<br>

        <span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><br>

        <span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><i>New

            publications:</i></span><b><a

            style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span

              style="font-size:12.0pt;mso-fareast-font-family:"Times

              New Roman";

              mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></a></b><br>

        <b><a

            style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span

              style="font-size:12.0pt;mso-fareast-font-family:"Times

              New Roman";

              mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>

              </b></span></a></b><br>

        <b><a

            style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span

              style="font-size:12.0pt;mso-fareast-font-family:"Times

              New Roman";

              mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>

              </b></span></a><a href="#ctb"><b><span

                style="font-size:12.0pt;mso-fareast-font-family:"Times

                New Roman";

                mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">- 

                Chinese Treebank 8.0  - </span></b></a></b><br>

        <b><a href="#ctb"><b><span

                style="font-size:12.0pt;mso-fareast-font-family:"Times

                New Roman";

                mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">

              </span></b></a></b><br>

        <b><a href="#ctb"><b><span

                style="font-size:12.0pt;mso-fareast-font-family:"Times

                New Roman";

                mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">

              </span></b></a><a href="#csc"><b><span

                style="font-size:12.0pt;mso-fareast-font-family:"Times

                New Roman";

                mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">- 

                CSC Deceptive Speech  -</span></b></a></b><a href="#csc"><b><span

              style="font-size:12.0pt;mso-fareast-font-family:"Times

              New Roman";

              mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a></div>

      <a href="#csc"><b><span

            style="font-size:12.0pt;mso-fareast-font-family:"Times

            New Roman";

            mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a></div>

    <a href="#csc"><b><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">

        </span></b></a><a

      style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a><a

      style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">

        </span></b></a>

    <hr size="2" width="100%"><a

      style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";

          mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a><span

style="font-size:12.0pt;mso-fareast-font-family:SimSun;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"> </span><br>

      <a name="scholar"></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b><span

            style="font-size:12.0pt;mso-fareast-font-family:"Times

            New Roman";

mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Spring

2014

            LDC Data Scholarship Program</span></b><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin"> <br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">Applications

            are now being accepted through Wednesday, January 15, 2014,

            11:59PM EST for the Spring 20143 LDC Data Scholarship

            program! The LDC Data Scholarship program provides

            university students with access to LDC data at no-cost.

            During previous program cycles, LDC has awarded no-cost

            copies of LDC data to over 35 individual students and

            student research groups.</span><br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">This

            program is open to students pursuing both undergraduate and

            graduate studies in an accredited college or university. LDC

            Data Scholarships are not restricted to any particular field

            of study; however, students must demonstrate a

            well-developed research agenda and a bona fide inability to

            pay. The selection process is highly competitive. </span><br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">The

            application consists of two parts: </span><br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">(1) Data

            Use Proposal. Applicants must submit a proposal describing

            their intended use of the data. The proposal should state

            which data the student plans to use and how the data will

            benefit their research project as well as information on the

            proposed methodology or algorithm.</span><br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">Applicants

            should consult the </span></span><a

          href="http://catalog.ldc.upenn.edu/" target="_blank"><span

            style="font-size:12.0pt;mso-fareast-font-family:"Times

            New Roman";mso-bidi-font-family:

Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC <span

              style="mso-spacerun:yes"> </span>Catalog</span></a><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin"> for a complete list of data

          distributed by LDC. Due to certain restrictions, a handful of

          LDC corpora are restricted to members of the Consortium.

          Applicants are advised to select a maximum of one to two

          datasets; students may apply for additional datasets during

          the following cycle once they have completed processing of the

          initial datasets and publish or present work in some juried

          venue.<br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">(2) Letter

            of Support. Applicants must submit one letter of support

            from their thesis adviser or department chair. The letter

            must verify the student's need for data and confirm that the

            department or university lacks the funding to pay the full

            Non-member Fee for the data or to join the Consortium.</span>

          <br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">For

            further information on application materials and program

            rules, please visit the </span></span><a

href="https://www.ldc.upenn.edu/language-resources/data/data-scholarships"

          target="_blank"><span

            style="font-size:12.0pt;mso-fareast-font-family:"Times

            New Roman";

            mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:#0000CC;

            mso-bidi-font-weight:bold">LDC Data Scholarship</span></a><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:

          Calibri;mso-bidi-theme-font:minor-latin"> page. <br>

          <br>

          <span style="mso-bidi-font-weight:bold">Students can email

            their applications to the </span></span><a

          href="mailto:datascholarships@ldc.upenn.edu"><span

            style="font-size:12.0pt;mso-fareast-font-family:"Times

            New Roman";mso-bidi-font-family:

Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC

            Data Scholarship program</span></a><span

          style="font-size:12.0pt;mso-fareast-font-family: "Times

          New

Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold">.

          Decisions will be sent by email from the same address.</span><span

          style="font-size:12.0pt;mso-fareast-font-family: "Times

          New

Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>

          <br>

          <span style="color:black;mso-bidi-font-weight:bold">The

            deadline for the Spring 2014 program cycle is January 15,

            2014, 11:59PM EST.<br>

          </span></span></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"> <b>New

          publications</b><br>

        <br style="mso-special-character:line-break">

      </span> <a name="ctb"></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)

      </span><a href="http://catalog.ldc.upenn.edu/LDC2013T21"><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin">Chinese Treebank 8.0</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin"> consists of

        approximately 1.5 million words of annotated and parsed text

        from Chinese newswire, government documents, magazine articles,

        various broadcast news and broadcast conversation programs, web

        newsgroups and weblogs.<o:p></o:p></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The

        Chinese Treebank project began at the University of Pennsylvania

        in 1998, continued at the University of Colorado and then moved

        to </span><a

        href="http://www.cs.brandeis.edu/%7Ellc/page2/page2.html"><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:

          Calibri;mso-bidi-theme-font:minor-latin">Brandeis University</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin">. The project’s goal is

        to provide a large, part-of-speech tagged and fully bracketed

        Chinese language corpus. The first delivery, Chinese Treebank

        1.0, contained 100,000 syntactically annotated words from Xinhua

        News Agency newswire. It was later corrected and released in

        2001 as </span><a

        href="http://catalog.ldc.upenn.edu/LDC2001T11"><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:

          Calibri;mso-bidi-theme-font:minor-latin">Chinese Treebank 2.0

          (LDC2001T11)</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin"> and consisted of

        approximately 100,000 words. The LDC released </span><a

        href="http://catalog.ldc.upenn.edu/LDC2004T05"><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin">Chinese Treebank 4.0

          (LDC2004T05)</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin">, an updated version

        containing roughly 400,000 words, in 2004. A year later, LDC

        published the 500,000 word </span><a

        href="http://catalog.ldc.upenn.edu/LDC2005T01"><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin">Chinese Treebank 5.0

          (LDC2005T01)</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin">. </span><a

        href="http://catalog.ldc.upenn.edu/LDC2007T36"><span

          style="font-size:12.0pt; mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:Calibri;

          mso-bidi-theme-font:minor-latin">Chinese Treebank 6.0

          (LDC2007T36)</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin">, released in 2007,

        consisted of 780,000 words. </span><a

        href="http://catalog.ldc.upenn.edu/LDC2010T07"><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:

          Calibri;mso-bidi-theme-font:minor-latin">Chinese Treebank 7.0

          (LDC2010T08)</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin">, released in 2010,

        added new annotated newswire data, broadcast material and web

        text to the approximate total of one million words. Chinese

        Treebank 8.0 adds new annotated data from newswire, magazine

        articles and government documents.<o:p></o:p></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">There

        are 3,007 text files in this release, containing 71,369

        sentences, 1,620,561 words, 2,589,848 characters (hanzi or

        foreign). The data is provided in UTF-8 encoding, and the

        annotation has Penn Treebank-style labeled brackets. Details of

        the annotation standard can be found in the <span

          style="mso-spacerun:yes"> </span>segmentation, POS-tagging and

        bracketing guidelines included in the release. The data is

        provided in four different formats: raw text, word segmented,

        POS-tagged, and syntactically bracketed formats. All files were

        automatically verified and manually checked.<o:p></o:p></span></p>

    <br>

    <p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;

      text-align:center;line-height:normal" align="center"><span

        style="font-size:12.0pt;mso-fareast-font-family: "Times New

Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">*<br

          style="mso-special-character:line-break">

        <br style="mso-special-character:line-break">

        <o:p></o:p></span></p>

    <p class="MsoNormal"

      style="margin-bottom:0in;margin-bottom:.0001pt;line-height:

      normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>

      </span> <a name="csc"></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(2)

      </span><a href="http://catalog.ldc.upenn.edu/LDC2013S09"><span

          style="font-size:12.0pt;mso-fareast-font-family:"Times

          New Roman";mso-bidi-font-family:

          Calibri;mso-bidi-theme-font:minor-latin">CSC Deceptive Speech</span></a><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";mso-bidi-font-family:

        Calibri;mso-bidi-theme-font:minor-latin"> was developed by

        Columbia University, SRI International and University of

        Colorado Boulder. It consists of 32 hours of audio interview

        from 32 native speakers of Standard American English (16 male,

        16 female) recruited from the Columbia University student

        population and the community. The purpose of the study was to

        distinguish deceptive speech from non-deceptive speech using

        machine learning techniques on extracted features from the

        corpus. <o:p></o:p></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The

        participants were told that they were participating in a

        communication experiment which sought to identify people who fit

        the profile of the top entrepreneurs in America. To this end,

        the participants performed tasks and answered questions in six

        areas. Tthey were later told that they had received low scores

        in some of those areas and did not fit the profile. The subjects

        then participated in an interview where they were told to

        convince the interviewer that they had actually achieved high

        scores in all areas and that they did indeed fit the profile.

        The task of the interviewer was to determine how he thought the

        subjects had actually performed, and he was allowed to ask them

        any questions other than those that were part of the performed

        tasks. For each question from the interviewer, subjects were

        asked to indicate whether the reply was true or contained any

        false information by pressing one of two pedals hidden from the

        interviewer under a table.<o:p></o:p></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Interviews

        were conducted in a double-walled sound booth and recorded to

        digital audio tape on two channels using Crown CM311A Differoid

        headworn close-talking microphones, then down sampled to 16kHz

        before processing. <o:p></o:p></span></p>

    <p class="MsoNormal"

      style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;

      line-height:normal"><span

        style="font-size:12.0pt;mso-fareast-font-family:"Times New

        Roman";

        mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The

        interviews were orthographically transcribed by hand using the

        NIST EARS transcription guidelines. Labels for local lies were

        obtained automatically from the pedal-press data and

        hand-corrected for alignment, and labels for global lies were

        annotated during transcription based on the known scores of the

        subjects versus their reported scores. The orthographic

        transcription was force-aligned using the SRI telephone speech

        recognizer adapted for full-bandwidth recordings. There are

        several segmentations associated with the corpus: the implicit

        segmentation of the pedal presses, derived semi-automatically

        sentence-like units (EARS SLASH-UNITS or SUs) which were hand

        labeled, intonational phrase units and the units corresponding

        to each topic of the interview.<o:p></o:p></span></p>

    <span style="font-size:12.0pt; mso-fareast-font-family:"Times

      New Roman";mso-bidi-font-family:Calibri;

      mso-bidi-theme-font:minor-latin"><o:p></o:p></span>

    <p class="MsoNormal"><o:p> </o:p><br>

    </p>

    <hr size="2" width="100%">

    <pre class="moz-signature" cols="72">-- 

--

Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>