<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<p align="center"><b>
- Programmer Analyst Positions at LDC -</b><br>
<font color="#000000"><b></b></font><b><br>
<br>
</b></p>
<p align="center"><font color="#000000">LDC2008T13 <br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T13">BLLIP
North American News Text, Complete</a> -<br>
</b>LDC2008T14 <br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T14">BLLIP
North American News Text, General Release</a> -<br>
<br>
</b><br>
LDC2008T15 <br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T15">North
American News Text, Complete</a> -</b><br>
LDC2008T16 <br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T16">North
American News Text, General Release</a> -<br>
<br>
</b></font></p>
<p align="center"><font color="#000000"><b>T</b></font><b>he Linguistic
Data
Consortium (LDC) would to announce position openings for programmer
analysts and the availability of new publications.<br>
</b></p>
<hr size="2" width="100%">
<p align="center"><b><br>
</b></p>
<p style="margin-bottom: 12pt; text-align: center;" align="center"><b>Programmer
Analyst Positions at LDC<br>
</b></p>
<p style="margin-bottom: 12pt; text-align: center;" align="center"><br>
<o:p></o:p></p>
<p style="margin-bottom: 12pt;">The LDC at the <st1:place><st1:placetype>University</st1:placetype>
of <st1:placename>Pennsylvania</st1:placename></st1:place> has several
immediate openings for full-time programmer analysts.<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style=""> Programmer Analyst - Text and Speech
Annotation Support (#080725253)<o:p></o:p></li>
</ul>
<p style="margin-bottom: 12pt;"><br>
Duties: This position will support LDC's
language resource creation projects by providing programming, technical
and
research support in a lead capacity. Primary responsibilities will be
to
design, develop and implement programming solutions and oversee all
technical
aspects of the projects, work with LDC's project managers, annotators,
programmers, and clients to develop achievable plans for corpus or
software
development and successfully execute them; write annotation tools, data
processing tools, web applications and other software necessary for the
projects; support annotation workflow; support end-users; investigate
technical
issues that may arise during the life cycles of projects, and provide
timely
solutions to them as necessary.<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style=""> Programmer Analyst - Arabic
Treebank (#080324301)<o:p></o:p></li>
</ul>
<p style="margin-bottom: 12pt;"><br>
Duties: Same as above; this position will
primarily work on Arabic Treebank and other Arabic-related projects.
(Grammatical knowledge and reading ability of the Arabic language
highly
preferred for this position.)<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style=""> Programmer Analyst - External
Relations (#080725188)<o:p></o:p></li>
</ul>
<p style="margin-bottom: 12pt;"><br>
Duties: This position will support LDC's External
Relations Group by designing. developing, coding and providing support
for
LDC's business systems. The business systems support the organization's
membership and sales activities and time tracking; features include
invoicing,
member tracking and reporting functions. This position will also
coordinate and prepare publications of language resources -- such as
video
computer-readable speech, and software and text data -- used for<br>
human language technology research and technology development.<br>
<br>
For further information on the duties and qualifications for these
positions,
or to apply online please visit <a href="http://jobs.hr.upenn.edu/">http://jobs.hr.upenn.edu/</a>;
search postings for the reference numbers indicated above.<br>
<br>
Penn offers an excellent benefits package including medical/dental,
retirement
plans, tuition assistance and a minimum of 3 weeks paid vacation per
year.
The <st1:place><st1:placetype>University</st1:placetype> of <st1:placename>Pennsylvania</st1:placename></st1:place>
is an affirmative action/equal opportunity employer. Positions
contingent
upon funding.<br>
<br>
For more information about LDC and the programs we support, visit <a
href="http://www.ldc.upenn.edu/">http://www.ldc.upenn.edu/</a>.</p>
<p style="margin-bottom: 12pt;"><br>
<o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom: 12pt;"><br>
<o:p></o:p></p>
<p class="MsoNormal" style="text-align: center;" align="center"><b><span
style="color: black;">New Publications</span></b><o:p></o:p></p>
<p><o:p> </o:p></p>
<p><span style="color: black;">(1) - (2) Brown Laboratory for
Linguistic
Information Processing (BLLIP) </span>contains a Penn Treebank-style
parsing of
text from the <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T21">North
American News Text Corpus (LDC95T21)</a>. The North American News Text
Corpus
consists of English news text from the Los Angeles Times-Washington
Post
(1994-1997), the New York Times (1994-1996), Reuters News Service
(1994-1996)
and the Wall Street Journal (1994-1996).<o:p></o:p></p>
<p>BLLIP North American News Text release is available as two versions:
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T13">BLLIP
North American News Text, Complete (LDC2008T13)</a>, a Members-Only
corpus that
contain sentences from all sources in The North American News Text
Corpus; and <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T14">BLLIP
North American News Text, General Release (LDC2008T14)</a>, a corpus
available
to nonmembers that does not include the Wall Street Journal data from
The North
American News Text Corpus.<o:p></o:p></p>
<p class="MsoNormal">The data in this release was parsed into Penn
Treebank-style
parse trees using a re-ranking parser developed by Eugene Charniak and
Mark
Johnson. The Charniak and Johnson parser is statistically-based and
uses a
generative first stage followed by a discriminative second stage. Both
stages
were trained on the Wall Street Journal data in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T7">Treebank-2
(LDC95T7)</a> and <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42">Treebank-3
(LDC99T42)</a>. <br>
<br>
In order to produce BLLIP North American News Text, the
Charniak-Johnson parser
used a simplified context free grammar in the first stage to generate a
set of <em>n
best</em> parses. Those parses were then pruned by eliminating the
parses at
the edges of the distribution. In the second stage, a maximum
entropy-based
parser using a complete grammar was applied. The output trees are
ranked in
order of probability. The parses in BLLIP North American News Text
include constituency and POS tagging information for each of the
50-best parses
of each sentence. Each file contains a sequence of n-best lists. An
n-best list is a list of the top n parses of each sentence with the
corresponding parser probability and re-ranker score. <br>
</p>
<p style="margin-bottom: 12pt;"><br>
<o:p></o:p></p>
<p style="margin-bottom: 12pt; text-align: center;" align="center"><b>*</b><o:p></o:p></p>
<p><span style="color: black;">(3) - (4) North American News Text</span>
is a
collection of English news text from the Los Angeles Times, Washington
Post,
New York Times, Reuters and the Wall Street Journal. This corpus was
originally
released in 1995 as the <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T21">North
American News Text Corpus (LDC95T21) </a>and is reissued to complement
the
release of the Brown Laboratory for Linguistic Information Processing
(BLLIP)
North American News Text sets (LDC2008T13, LDC2008T14), which consist
of Penn
Treebank-style parsing of that news text. <o:p></o:p></p>
<p>North American News Text is reissued in two versions: <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T15">North
American News Text, Complete (LDC2008T15)</a>, the Members-Only
original
version, now available as a 2008 Membership Year corpus; and <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T16">North
American News Text, General Release (LDC2008T16)</a>, a corpus
available to
nonmembers, which does not include text from the Wall Journal Street
Journal.
The directory structure of each of these publications has been
restructured to
be identical to the directory structure of the BLLIP releases. The
text
content of each data file (following uncompression with the GNU-unzip
utility)
consists of plain ASCII character data with SGML tags to indicate
article
boundaries and organization of information within each article. <br>
</p>
<br>
<hr size="2" width="100%">
<div align="center"><font face="Courier New, Courier, monospace"><small><small><big><br>
Ilya
Ahtaridis<br>
Membership Coordinator</big><br>
<br>
</small>--------------------------------------------------------------------</small><small><br>
</small></font></div>
<div align="center">
<pre class="moz-signature" cols="72"><font
face="Courier New, Courier, monospace">Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></font></pre>
</div>
<br>
</body>
</html>