<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div align="center">
<div align="left"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="#scholar"><b>- Spring 2014 LDC Data Scholarship
Program</b></a></span> -<br>
<span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><br>
<span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><i>New
publications:</i></span><b><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></a></b><br>
<b><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>
</b></span></a></b><br>
<b><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>
</b></span></a><a href="#ctb"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
Chinese Treebank 8.0 - </span></b></a></b><br>
<b><a href="#ctb"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
</span></b></a></b><br>
<b><a href="#ctb"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
</span></b></a><a href="#csc"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">-
CSC Deceptive Speech -</span></b></a></b><a href="#csc"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a></div>
<a href="#csc"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a></div>
<a href="#csc"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
</span></b></a><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">
</span></b></a>
<hr size="2" width="100%"><a
style="mso-comment-reference:dd_1;mso-comment-date:20131112T1735"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b></a><span
style="font-size:12.0pt;mso-fareast-font-family:SimSun;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"> </span><br>
<a name="scholar"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Spring
2014
LDC Data Scholarship Program</span></b><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> <br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">Applications
are now being accepted through Wednesday, January 15, 2014,
11:59PM EST for the Spring 20143 LDC Data Scholarship
program! The LDC Data Scholarship program provides
university students with access to LDC data at no-cost.
During previous program cycles, LDC has awarded no-cost
copies of LDC data to over 35 individual students and
student research groups.</span><br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">This
program is open to students pursuing both undergraduate and
graduate studies in an accredited college or university. LDC
Data Scholarships are not restricted to any particular field
of study; however, students must demonstrate a
well-developed research agenda and a bona fide inability to
pay. The selection process is highly competitive. </span><br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">The
application consists of two parts: </span><br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">(1) Data
Use Proposal. Applicants must submit a proposal describing
their intended use of the data. The proposal should state
which data the student plans to use and how the data will
benefit their research project as well as information on the
proposed methodology or algorithm.</span><br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">Applicants
should consult the </span></span><a
href="http://catalog.ldc.upenn.edu/" target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC <span
style="mso-spacerun:yes"> </span>Catalog</span></a><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> for a complete list of data
distributed by LDC. Due to certain restrictions, a handful of
LDC corpora are restricted to members of the Consortium.
Applicants are advised to select a maximum of one to two
datasets; students may apply for additional datasets during
the following cycle once they have completed processing of the
initial datasets and publish or present work in some juried
venue.<br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">(2) Letter
of Support. Applicants must submit one letter of support
from their thesis adviser or department chair. The letter
must verify the student's need for data and confirm that the
department or university lacks the funding to pay the full
Non-member Fee for the data or to join the Consortium.</span>
<br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">For
further information on application materials and program
rules, please visit the </span></span><a
href="https://www.ldc.upenn.edu/language-resources/data/data-scholarships"
target="_blank"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:#0000CC;
mso-bidi-font-weight:bold">LDC Data Scholarship</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> page. <br>
<br>
<span style="mso-bidi-font-weight:bold">Students can email
their applications to the </span></span><a
href="mailto:datascholarships@ldc.upenn.edu"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin;mso-bidi-font-weight:bold">LDC
Data Scholarship program</span></a><span
style="font-size:12.0pt;mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black;mso-bidi-font-weight:bold">.
Decisions will be sent by email from the same address.</span><span
style="font-size:12.0pt;mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
<br>
<span style="color:black;mso-bidi-font-weight:bold">The
deadline for the Spring 2014 program cycle is January 15,
2014, 11:59PM EST.<br>
</span></span></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"> <b>New
publications</b><br>
<br style="mso-special-character:line-break">
</span> <a name="ctb"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(1)
</span><a href="http://catalog.ldc.upenn.edu/LDC2013T21"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Chinese Treebank 8.0</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> consists of
approximately 1.5 million words of annotated and parsed text
from Chinese newswire, government documents, magazine articles,
various broadcast news and broadcast conversation programs, web
newsgroups and weblogs.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
Chinese Treebank project began at the University of Pennsylvania
in 1998, continued at the University of Colorado and then moved
to </span><a
href="http://www.cs.brandeis.edu/%7Ellc/page2/page2.html"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Brandeis University</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. The project’s goal is
to provide a large, part-of-speech tagged and fully bracketed
Chinese language corpus. The first delivery, Chinese Treebank
1.0, contained 100,000 syntactically annotated words from Xinhua
News Agency newswire. It was later corrected and released in
2001 as </span><a
href="http://catalog.ldc.upenn.edu/LDC2001T11"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Chinese Treebank 2.0
(LDC2001T11)</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> and consisted of
approximately 100,000 words. The LDC released </span><a
href="http://catalog.ldc.upenn.edu/LDC2004T05"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Chinese Treebank 4.0
(LDC2004T05)</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">, an updated version
containing roughly 400,000 words, in 2004. A year later, LDC
published the 500,000 word </span><a
href="http://catalog.ldc.upenn.edu/LDC2005T01"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Chinese Treebank 5.0
(LDC2005T01)</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">. </span><a
href="http://catalog.ldc.upenn.edu/LDC2007T36"><span
style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin">Chinese Treebank 6.0
(LDC2007T36)</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">, released in 2007,
consisted of 780,000 words. </span><a
href="http://catalog.ldc.upenn.edu/LDC2010T07"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Chinese Treebank 7.0
(LDC2010T08)</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">, released in 2010,
added new annotated newswire data, broadcast material and web
text to the approximate total of one million words. Chinese
Treebank 8.0 adds new annotated data from newswire, magazine
articles and government documents.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">There
are 3,007 text files in this release, containing 71,369
sentences, 1,620,561 words, 2,589,848 characters (hanzi or
foreign). The data is provided in UTF-8 encoding, and the
annotation has Penn Treebank-style labeled brackets. Details of
the annotation standard can be found in the <span
style="mso-spacerun:yes"> </span>segmentation, POS-tagging and
bracketing guidelines included in the release. The data is
provided in four different formats: raw text, word segmented,
POS-tagged, and syntactically bracketed formats. All files were
automatically verified and manually checked.<o:p></o:p></span></p>
<br>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;
text-align:center;line-height:normal" align="center"><span
style="font-size:12.0pt;mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">*<br
style="mso-special-character:line-break">
<br style="mso-special-character:line-break">
<o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><br>
</span> <a name="csc"></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">(2)
</span><a href="http://catalog.ldc.upenn.edu/LDC2013S09"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">CSC Deceptive Speech</span></a><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"> was developed by
Columbia University, SRI International and University of
Colorado Boulder. It consists of 32 hours of audio interview
from 32 native speakers of Standard American English (16 male,
16 female) recruited from the Columbia University student
population and the community. The purpose of the study was to
distinguish deceptive speech from non-deceptive speech using
machine learning techniques on extracted features from the
corpus. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
participants were told that they were participating in a
communication experiment which sought to identify people who fit
the profile of the top entrepreneurs in America. To this end,
the participants performed tasks and answered questions in six
areas. Tthey were later told that they had received low scores
in some of those areas and did not fit the profile. The subjects
then participated in an interview where they were told to
convince the interviewer that they had actually achieved high
scores in all areas and that they did indeed fit the profile.
The task of the interviewer was to determine how he thought the
subjects had actually performed, and he was allowed to ask them
any questions other than those that were part of the performed
tasks. For each question from the interviewer, subjects were
asked to indicate whether the reply was true or contained any
false information by pressing one of two pedals hidden from the
interviewer under a table.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Interviews
were conducted in a double-walled sound booth and recorded to
digital audio tape on two channels using Crown CM311A Differoid
headworn close-talking microphones, then down sampled to 16kHz
before processing. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:normal"><span
style="font-size:12.0pt;mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">The
interviews were orthographically transcribed by hand using the
NIST EARS transcription guidelines. Labels for local lies were
obtained automatically from the pedal-press data and
hand-corrected for alignment, and labels for global lies were
annotated during transcription based on the known scores of the
subjects versus their reported scores. The orthographic
transcription was force-aligned using the SRI telephone speech
recognizer adapted for full-bandwidth recordings. There are
several segmentations associated with the corpus: the implicit
segmentation of the pedal presses, derived semi-automatically
sentence-like units (EARS SLASH-UNITS or SUs) which were hand
labeled, intonational phrase units and the units corresponding
to each topic of the interview.<o:p></o:p></span></p>
<span style="font-size:12.0pt; mso-fareast-font-family:"Times
New Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><o:p></o:p></span>
<p class="MsoNormal"><o:p> </o:p><br>
</p>
<hr size="2" width="100%">
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>