<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<hr size="2" width="100%">
<div align="center"><tt><i style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">In
this newsletter:</span></i></tt><br>
<tt><i style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></i><b><i
style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></i></b>
</tt><br>
<b><tt><span
style="font-size:11.0pt;mso-fareast-font-family:Calibri;mso-bidi-font-weight:bold"><span
style="mso-list:Ignore">-<span style="font:7.0pt
"Times New Roman""> </span></span></span><a
href="#google">LDC and Google Collaboration Results in New
Syntactically-Annotated Language Resources</a><span
style="font-size:11.0pt;mso-ascii-font-family:
Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><span style="mso-spacerun:yes"> -</span></span></tt></b><br>
<b> <tt><span style="font-size:11.0pt;mso-ascii-font-family:
Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span> </tt></b><br>
<b> <tt><span
style="font-size:11.0pt;mso-fareast-font-family:Calibri;mso-bidi-font-weight:bold"><span
style="mso-list:Ignore"><span style="font:7.0pt
"Times New Roman""></span></span></span><span
style="font-size:11.0pt;mso-ascii-font-family:
Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><span style="mso-spacerun:yes"> </span>- </span><a
href="#20th">The Future of Language Resources: LDC 20th
Anniversary Workshop</a><span style="mso-spacerun:yes"></span><span
style="font-size:11.0pt;mso-ascii-font-family:
Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><span style="mso-spacerun:yes"> -</span></span></tt></b><br>
<b> <tt><span style="font-size:11.0pt;mso-ascii-font-family:
Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span> </tt></b><br>
<b> <tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- </span><a href="#scholar">Fall 2012 LDC Data
Scholarship Program</a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> -</span></tt></b><br>
<b> <tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span> </tt></b><br>
<b> <tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span> </tt></b><br>
<tt><i style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">New
publications:</span></i></tt><br>
<tt><i style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></i><b><i
style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></i></b>
</tt><br>
<tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">LDC2012T13</span></tt><br>
<tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- </span></b><b><a href="#webtb">English Web
Treebank</a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> -</span></b></tt><br>
<tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b> </tt><br>
<tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p>LDC2012T14</span></tt><br>
<b><tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- </span><a href="#gale">GALE Phase 2 Arabic
Broadcast Conversation Parallel Text Part 2</a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> –</span></tt></b><br>
<tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b> </tt><br>
<tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p>LDC2012T12</span></tt><br>
<b><tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">- </span><a href="#time">Spanish TimeBank 1.0</a><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> –</span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p></span></tt></b><br>
<tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b></tt><tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p></span></b></tt>
</div>
<tt> <b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p></span></b></tt><tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b> </tt><br>
<hr size="2" width="100%"> <tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><br>
</span></b><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><br>
</span></b></tt>
<div align="center"><tt><a name="google"></a> <b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">LDC and Google Collaboration Results in New
Syntactically-Annotated Language Resources</span></b></tt><br>
</div>
<tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"></span></b> <b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b> </tt><tt><br>
<span style="mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Google Inc.<span
style="mso-spacerun:yes"> </span>and the Linguistic Data
Consortium (LDC) have collaborated to develop new
syntactically-annotated language resources that enable computers
to better understand human language. The project, funded<b
style="mso-bidi-font-weight:normal"> </b>through a gift from
Google in 2010, has resulted in the development of the <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T13">English
Web
Treebank LDC2012T13</a> containing over 250,000 words of
weblogs, newsgroups, email, reviews and question-answers
manually annotated for syntactic structure. This resource will
allow language technology researchers to develop and evaluate
the robustness of parsing methods in various new web domains. It
was used in the 2012 shared task on parsing English web text for
the <a href="https://sites.google.com/site/sancl2012/">First
Workshop on Syntactic Analysis of Non-Canonical Language
(SANCL)</a> which took place at NAACL-HLT in Montreal on June
8, 2012. The English Web Treebank is available to the research
community through <a href="http://www.ldc.upenn.edu/Catalog/">LDC’s
Catalog</a>.<br>
<o:p></o:p></span> <br>
<span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Natural language processing (NLP) is a field of
computational linguistic research concerned with the
interactions between human language and computers. Parsing is a
discipline within NLP in which computers analyze text and
determine its syntactic structure. While syntactic parsing is
already practically useful, Google funded this effort to help
the research community develop better parsers for web text. The
web texts collected and annotated by LDC provide new, diverse
data for training parsing systems. <br>
<o:p></o:p></span> <br>
<span style="mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">Google chose LDC for
this work based on the Consortium’s experience in developing and
creating syntactic annotations, also known as treebanks.
Treebanks are critically important to parsing research since
they provide human-analyzed sentence structures that facilitate
training and testing scenarios in NLP research. This work
extends the existing relationship between LDC and Google.<span
style="mso-spacerun:yes"> </span>LDC has published four
other Google-developed data sets in the past six years: English,
Chinese, Japanese and European language n-grams used principally
for language modeling. <o:p></o:p></span> <br>
<b><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p></o:p></span></b></tt><tt><b><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><o:p> </o:p></span></b> <br>
<span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20221#top"></a></tt><tt><span
style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
</tt><br>
<div align="center"> <tt><span
style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><span
style="mso-spacerun:yes"></span></span> <a name="20th"></a><span
style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>The
Future of Language Resources: LDC 20th Anniversary Workshop
<o:p></o:p></b></span> </tt><br>
<tt><span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
</tt><br>
</div>
<tt><span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">LDC’s
20th
Anniversary Workshop is rapidly approaching! The event will take
place on the University of Pennsylvania’s campus on September
6-7, 2012. <o:p></o:p></span> <br>
<span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
<br>
<span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Workshop
themes
include: the developments in human language technologies and
associated resources that have brought us to our current state;
the language resources required by the technical approaches
taken and the impact of these resources on HLT progress; the
applications of HLT and resources to other disciplines including
law, medicine, economics, the political sciences and psychology;
the impact of HLTs and related technologies on linguistic
analysis and novel approaches in fields as widespread as
phonetics, semantics, language documentation, sociolinguistics
and dialect geography; and the impact of any of these
developments on the ways in which language resources are
created, shared and exploited and on the specific resources
required. <o:p></o:p></span> <br>
<span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
<br>
<span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"></span></tt><tt><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin">Please read more <a
href="http://www.ldc.upenn.edu/About/20th_Anniversary_Workshop.html">here</a>.<br>
<br>
</span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20221#top">
</a></tt><br>
<tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"> </span></tt><tt><br>
<br>
</tt>
<div align="center"><tt><a name="scholar"><b><span
style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Fall
2012
LDC Data Scholarship Program</span></b></a> <b><span
style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b><span
style="mso-bookmark:data"></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"> <o:p></o:p></span> </tt><br>
<br>
</div>
<tt><span style="mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Applications
are
now being accepted through September 17, 2012, 11:59PM EST for
the Fall 2012 LDC Data Scholarship program! The LDC Data
Scholarship program provides university students with access to
LDC data at no-cost. During previous program cycles, LDC has
awarded no-cost copies of LDC data to over 20 individual
students and student research groups.<br>
<br>
This program is open to students pursuing both undergraduate and
graduate studies in an accredited college or university. LDC
Data Scholarships are not restricted to any particular field of
study; however, students must demonstrate a well-developed
research agenda and a bona fide inability to pay. The selection
process is highly competitive. <br>
<br>
The application consists of two parts: <br>
<br>
</span><span style="mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
(1) <b>Data Use Proposal</b>. Applicants must submit a proposal
describing their intended use of the data. The proposal should
state which data the student plans to use and how the data will
benefit their research project as well as information on the
proposed methodology or algorithm.</span><br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
</span><br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
Applicants should consult the </span> <span
style="mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Catalog/index.jsp"
target="_blank"><span
style="mso-fareast-font-family:"Times New
Roman";color:#0000CC">LDC Corpus Catalog</span></a></span><span
style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
for a complete list of data distributed by LDC. Due to certain
restrictions, a handful of LDC corpora are restricted to members
of the Consortium. Applicants are advised to select a maximum of
one to two datasets; students may apply for additional datasets
during the following cycle once they have completed processing
of the initial datasets and publish or present work in some
juried venue.</span><br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
</span><br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
(2) <b>Letter of Support</b>. Applicants must submit one letter
of support from their thesis adviser or department chair. The
letter must confirm that the department or university lacks the
funding to pay the full Non-member Fee for the data and verify
the student's need for data.</span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
<br>
For further information on application materials and program
rules, please visit the </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
minor-latin"><a
href="http://www.ldc.upenn.edu/About/scholarships.html"
target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">LDC Data Scholarship</span></a></span><span
style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
page. <br>
<br>
Students can email their applications to the </span><span
style="mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><a
href="mailto:datascholarships@ldc.upenn.edu"><span
style="mso-fareast-font-family: "Times New
Roman";color:#0000CC">LDC Data Scholarship program</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">. Decisions will be
sent by email from the same address.<br>
<br>
The deadline for the Fall 2012 program cycle is September 17,
2012, 11:59PM EST.<br>
<o:p></o:p></span> </tt><br>
<br>
<tt><b><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></b></tt><tt><span
style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><br>
<br>
</span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20219#top">
</a></tt><tt><b><span style="mso-fareast-font-family: "Times
New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><br>
</span></b></tt>
<div align="right">
<div align="center"><tt><b><span style="mso-fareast-font-family:
"Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
New publications</span></b></tt><br>
<br>
</div>
<tt><b><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></b></tt></div>
<tt><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><o:p></o:p></span>
<br>
<a name="webtb"></a> <span style="mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin">(1)<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T13">English
Web Treebank</a> was developed by the Linguistic Data
Consortium (LDC) with funding through a gift from Google Inc. It
consists of over 250,000 words of English weblogs, newsgroups,
email, reviews and question-answers manually annotated for
syntactic structure and is designed to allow language technology
researchers to develop and evaluate the robustness of parsing
methods in those web domains. <br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">This
release
contains 254,830 word-level tokens and 16,624 sentence-level
tokens of webtext in 1174 files annotated for sentence- and
word-level tokenization, part-of-speech, and syntactic
structure. The data is roughly evenly divided across five
genres: weblogs, newsgroups, email, reviews, and
question-answers. The files were manually annotated following
the sentence-level tokenization guidelines for web text and the
word-level tokenization guidelines developed for English
treebanks in the </span><span
style="mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><a
href="http://projects.ldc.upenn.edu/gale/index.html"
target="_blank"><span
style="mso-fareast-font-family:"Times New
Roman";color:#0000CC">DARPA GALE</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> project. Only text
from the subject line and message body of posts, articles,
messages and question-answers were collected and annotated.<br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span><span
style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Non-members
may license this data by completing the </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf"
target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">LDC User Agreement for Non-members</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">. The agreement can
be faxed to +1 215 573 2175 or scanned and emailed to this
address. The first fifty copies of this publication are being
made available at no charge. After the first fifty copies are
distributed, the non-member fee of US$175 applies.<br>
<br>
</span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20219#top">
</a><br>
</tt><tt><span style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> </span><br>
</tt>
<div align="center"><tt><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">*</span></tt><br>
<tt><span style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"></span></tt></div>
<tt><span style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> </span><br>
<a name="gale"></a> <span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T14">GALE
Phase 2 Arabic Broadcast Conversation Parallel Text Part 2</a>
was developed by LDC. Along with other corpora, the parallel
text in this release comprised training data for Phase 2 of the
DARPA GALE (Global Autonomous Language Exploitation) Program.
This corpus contains Modern Standard Arabic source text and
corresponding English translations selected from broadcast
conversation (BC) data collected by LDC between 2004 and 2007
and transcribed by LDC or under its direction. <br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">GALE
Phase
2 Arabic Broadcast Conversation Parallel Text Part 2 includes 29
source-translation document pairs, comprising 169,488 words of
Arabic source text and its English translation. Data is drawn
from eight distinct Arabic programs broadcast between 2004 and
2007 from Aljazeera, a regional broadcast programmer based in
Doha, Qatar; and Nile TV, an Egyptian broadcaster. The programs
in this release focus on current events topics.<br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">The
files
in this release were transcribed by LDC staff and/or
transcription vendors under contract to LDC in accordance with
the </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V2.pdf"
target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">Quick Rich Transcription</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> guidelines
developed by LDC. Transcribers indicated sentence boundaries in
addition to transcribing the text. Data was manually selected
for translation according to several criteria, including
linguistic features, transcription features and topic features.
The transcribed and segmented files were then reformatted into a
human-readable translation format and assigned to translation
vendors. Translators followed LDC's Arabic to English
translation guidelines. Bilingual LDC staff performed quality
control procedures in the completed translations. <br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></tt><br>
<br>
<div align="center">*<br>
</div>
<tt><span style="mso-fareast-font-family:"Times New
Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><o:p></o:p></span>
<br>
<a name="time"></a> <span
style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">(3)
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T12">Spanish
TimeBank 1.0</a> was developed by researchers at </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.barcelonamedia.org/" target="_blank"><span
style="mso-fareast-font-family: "Times New
Roman";color:#0000CC">Barcelona Media</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> and consists of
Spanish texts in the </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://clic.ub.edu/corpus/en/ancora" target="_blank"><span
style="mso-fareast-font-family:"Times New
Roman";color:#0000CC">AnCora corpus</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"> annotated with
temporal and event information according to the </span><span
style="mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin"><a
href="http://www.timeml.org/site/index.html" target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">TimeML specification language</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">.<br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Spanish
TimeBank
1.0 contains stand-off annotations for 210 documents with over
75,800 tokens (including punctuation marks) and 68,000 tokens
(excluding punctuation). The source documents are news stories
and fiction from the AnCora corpus.<br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">The
AnCora
corpus is the largest multilayer annotated corpus of Spanish and
Catalan. AnCora contains 400,000 words in Spanish and 275,000
words in Catalan. The AnCora documents are annotated on many
linguistic levels including structure, syntax, dependencies,
semantics and pragmatics. That information is not included in
this release, but it can be mapped to the present annotations.
The corpus is freely available from the </span><span
style="mso-bidi-font-family:
Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://clic.ub.edu/ancora" target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">Centre de Llenguatge i Computació (CLiC)</span></a></span><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black">.<br>
<o:p></o:p></span> <br>
<span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span><span
style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Non-members
may license this data by completing the </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf"
target="_blank"><span
style="mso-fareast-font-family:"Times New Roman";
color:#0000CC">LDC User Agreement for Non-members</span></a></span></tt><span
style="mso-fareast-font-family:"Times New
Roman";mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin;color:black"><tt>. The agreement
can be faxed to +1 215 573 2175 or scanned and emailed to this
address. The publication is being made available at no charge.<br>
</tt></span>
<hr size="2" width="100%"><br>
<pre class="moz-signature" cols="72"><tt>
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</tt></pre>
<hr size="2" width="100%">
</body>
</html>