<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p class="MsoNormal" align="left"><b><b><a href="#fall">Fall 2014
Data Scholarship Recipients</a></b></b><br>
<b><br>
</b><b><a href="#spring">Spring 2015 Data Scholarship Program</a><br>
</b><br>
<b><a href="#twitter">LDC is now on Twitter </a><br>
</b></p>
<i>New publications:</i>
<p class="MsoNormal" align="left"><b><a href="#lies">Boulder Lies
and Truth</a></b><b><br>
</b><b><br>
</b><b><a href="#galece">GALE Chinese-English Word Alignment and
Tagging -- Broadcast Training Part 2</a></b><b><br>
</b><b><br>
</b><b><a href="#galep2">GALE Phase 2 Chinese Web Parallel Text</a></b></p>
<hr size="2" width="100%">
<hr size="2" width="100%">
<p class="MsoNormal"><a name="fall"></a><b>Fall 2014 Data
Scholarship Recipients</b><o:p></o:p></p>
<p class="MsoNormal">LDC is pleased to announce the student
recipients of the Fall 2014 <a
href="https://www.ldc.upenn.edu/language-resources/data/data-scholarships">LDC
Data
Scholarship program</a>.<span
style="mso-special-character:comment"> </span> The following
students have received no-cost copies of LDC data:<o:p></o:p></p>
<blockquote>
<p class="MsoNormal">Mohammed Abumatar ~ University of Jordan
(Jordan), Bsc Candidate, Computer Engineering. Mohammed has
been awarded a copies of MADCAT Phase 1-3 Training Data for his
work in handwriting recognition.<br>
<br>
Ramy Baly ~ American University of Beirut (Lebanon), PhD
candidate, Electrical and Computer Engineering. Ramy has been
awarded a copies of Arabic Treebank Parts 1-3 for his work in
opinion mining.<br>
<br>
Abbas Khosravanai ~ Amirkabir University of Technology (Iran),
PhD candidate, Computer Engineering. Abbas has been awarded a
copy of 2008 NIST Speaker Recognition for his work in robust
speaker recognition.<br>
<br>
Phuc Nguyen ~ University of North Texas (USA), PhD candidate,
Computer Science and Engineering. Phuc has been awarded a copy
of Message Understanding Conference (MUC) 7 for his work in
named entity recognition.<o:p></o:p></p>
</blockquote>
<o:p></o:p>
<blockquote> </blockquote>
<p class="MsoNormal"><a
style="mso-comment-reference:DD_3;mso-comment-date:20141112T1004"><br>
</a><a name="spring"></a><a
style="mso-comment-reference:DD_3;mso-comment-date:20141112T1004"><b>Spring
2015
Data Scholarship Program</b></a><span
style="mso-special-character:comment"></span><o:p></o:p></p>
<p class="MsoNormal">Applications are now being accepted through
Thursday, January 15, 2015, 11:59PM EST for the Spring 2015 LDC
Data Scholarship program. The LDC Data Scholarship program
provides university students with access to LDC data at no-cost.
During previous program cycles, LDC has awarded no-cost copies of
LDC data to over 40 individual students and student research
groups. This program is open to students pursuing both
undergraduate and graduate studies in an accredited college or
university. LDC Data Scholarships are not restricted to any
particular field of study; however, students must demonstrate a
well-developed research agenda and a bona fide inability to pay. <o:p></o:p></p>
<p class="MsoNormal"><br>
The application consists of two parts: <br>
<br>
(1) Data Use Proposal. Applicants must submit a proposal
describing their intended use of the data. The proposal should
state which data the student plans to use and how the data will
benefit their research project as well as information on the
proposed methodology or algorithm.<br>
<br>
(2) Letter of Support. Applicants must submit one letter of
support from their thesis adviser or department chair. The letter
must verify the student's need for data and confirm that the
department or university lacks the funding to pay the full
non-member fee for the data or to join the Consortium. <br>
<br>
For further information on application materials and program
rules, please visit the <a
href="https://www.ldc.upenn.edu/language-resources/data/data-scholarships"
target="_blank">LDC Data Scholarship</a> page. <br>
<br>
Students can email their applications to the <a
href="mailto:datascholarships@ldc.upenn.edu">LDC Data
Scholarship program</a>. Decisions will be sent by email from
the same address.<br>
<br>
The deadline for the Spring 2015 program cycle is January 15,
2015, 11:59PM EST.<o:p></o:p></p>
<p class="MsoNormal"><br>
<a name="twitter"></a><b>LDC is now on Twitter </b><br>
<br>
LDC now has a Twitter <a href="https://twitter.com/LDCupenn">feed</a>.
Start following us today for updates on new corpora releases and
the latest LDC news.<o:p></o:p></p>
<p class="MsoNormal"><br>
<br>
<b>New publications</b><br>
<br style="mso-special-character:line-break">
<a name="lies"></a>(1) <a
href="https://catalog.ldc.upenn.edu/LDC2014T24">Boulder Lies and
Truth</a> was developed at the University of Colorado Boulder
and contains approximately 1,500 elicited English reviews of
hotels and electronics for the purpose of studying deception in
written language. Reviews were collected by crowd-sourcing with
Amazon Medical Turk.<o:p></o:p></p>
<p class="MsoNormal">Each review was required to be original and was
checked for plagiarism against the web. Reviews were annotated
with respect to the following three dimensions:<o:p></o:p></p>
<blockquote>
<p class="MsoNormal">Domain: Electronics (e.g., iPhone) or Hotels<o:p></o:p></p>
<p class="MsoNormal">Sentiment: Positive or Negative<o:p></o:p></p>
</blockquote>
<p class="MsoNormal">Truth Value:<o:p></o:p></p>
<blockquote>
<p class="MsoNormal">a) Truthful: a review about an object known
by the writer reflecting the real sentiment of the writer toward
the object of the review<o:p></o:p></p>
<p class="MsoNormal" align="center">b) Opposition: A review about
an object known by the writer reflecting the opposite sentiment
of the writer toward the object of the review (i.e., if the
writer liked the object they were asked to write a negative
review; if the writer did not like the object, they were asked
to write a positive review)<o:p></o:p></p>
<p class="MsoNormal">c) Deceptive (i.e., fabricated): a review
written about an object not known by the writer either positive
or negative in sentiment; the objects reviewed were provided via
a URL from the tasks in (a) and (b)<o:p></o:p></p>
<p class="MsoNormal">Each review was judged a total of 30 times:
(1) 10 times to evaluate its perceived quality (on a range from
1-5); (2) 10 times with judgments about its perceived
truthfulness (e.g., truthful or somehow deceptive, a lie or a
fabrication); and (3) 10 times for its perceived sentiment
(i.e., star rating).<o:p></o:p></p>
</blockquote>
<p class="MsoNormal">This data is available at no-cost under this <a
href="https://catalog.ldc.upenn.edu/license/boulder-lies-and-truth.pdf">user
license
agreement</a>.<br>
<o:p></o:p></p>
<p class="MsoNormal" align="center"><o:p> *</o:p></p>
<p class="MsoNormal"><a name="galece"></a>(2) <a
href="https://catalog.ldc.upenn.edu/LDC2014T25">GALE
Chinese-English Word Alignment and Tagging -- Broadcast Training
Part 2</a> was developed by LDC and contains 65,069 tokens of
word aligned Chinese and English parallel text enriched with
linguistic tags. This material was used as training data in the
DARPA GALE (Global Autonomous Language Exploitation) program.<o:p></o:p></p>
<p class="MsoNormal">Some approaches to statistical machine
translation include the incorporation of linguistic knowledge in
word aligned text as a means to improve automatic word alignment
and machine translation quality. This is accomplished with two
annotation schemes: alignment and tagging. Alignment identifies
minimum translation units and translation relations by using
minimum-match and attachment annotation approaches. A set of word
tags and alignment link tags are designed in the tagging scheme to
describe these translation units and relations. Tagging adds
contextual, syntactic and language-specific features to the
alignment annotation.<o:p></o:p></p>
<p class="MsoNormal">This release consists of Chinese source
broadcast conversation (BC) programming collected by LDC in 2008.
<o:p></o:p></p>
<p class="MsoNormal">The Chinese word alignment tasks consisted of
the following components:<o:p></o:p></p>
<blockquote>
<p class="MsoNormal">Identifying, aligning, and tagging eight
different types of links<o:p></o:p></p>
<p class="MsoNormal">Identifying, attaching, and tagging
local-level unmatched words<o:p></o:p></p>
<p class="MsoNormal">Identifying and tagging
sentence/discourse-level unmatched words<o:p></o:p></p>
<p class="MsoNormal">Identifying and tagging all instances of
Chinese 的(DE) except when they were a part of a semantic link<o:p></o:p></p>
</blockquote>
<p class="MsoNormal" align="center">*<br>
</p>
<p class="MsoNormal"><a name="galep2"></a>(3) <a
href="https://catalog.ldc.upenn.edu/LDC2014T26">GALE Phase 2
Chinese Web Parallel Text</a> was developed by LDC and along
with other corpora, the parallel text in this release comprised
training data for Phase 2 of the DARPA GALE (Global Autonomous
Language Exploitation) Program. This corpus contains Chinese
source text and corresponding English translations selected from
weblog and newsgroup data collected by LDC and translated by LDC
or under its direction.<o:p></o:p></p>
<p class="MsoNormal">This release includes 46 source-translation
document pairs, comprising 66,779 tokens of translated data. Data
is drawn from four Chinese weblog and newsgroup sources.<o:p></o:p></p>
<p class="MsoNormal">Data was manually selected for translation
according to several criteria, including linguistic features and
topic features. The files were formatted into a human-readable
translation format and assigned to translation vendors.
Translators followed LDC's Chinese to English translation
guidelines. Bilingual LDC staff performed quality control
procedures on the completed translations.<o:p></o:p></p>
<br>
<hr size="2" width="100%"><br>
<pre class="moz-signature" cols="72">--
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</body>
</html>