<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

<p align="center"><b>

-  Programmer Analyst Positions at LDC  -</b><br>

<font color="#000000"><b></b></font><b><br>

<br>

</b></p>

<p align="center"><font color="#000000">LDC2008T13 <br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T13">BLLIP

North American News Text, Complete</a>  -<br>

</b>LDC2008T14 <br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T14">BLLIP

North American News Text, General Release</a>  -<br>

<br>

</b><br>

LDC2008T15 <br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T15">North

American News Text, Complete</a>  -</b><br>

LDC2008T16 <br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T16">North

American News Text, General Release</a>  -<br>

<br>

</b></font></p>

<p align="center"><font color="#000000"><b>T</b></font><b>he Linguistic

Data

Consortium (LDC) would to announce position openings for programmer

analysts and the availability of new publications.<br>

</b></p>

<hr size="2" width="100%">

<p align="center"><b><br>

</b></p>

<p style="margin-bottom: 12pt; text-align: center;" align="center"><b>Programmer

Analyst Positions at LDC<br>

</b></p>

<p style="margin-bottom: 12pt; text-align: center;" align="center"><br>

<o:p></o:p></p>

<p style="margin-bottom: 12pt;">The LDC at the <st1:place><st1:placetype>University</st1:placetype>

of <st1:placename>Pennsylvania</st1:placename></st1:place> has several

immediate openings for full-time programmer analysts.<o:p></o:p></p>

<ul type="disc">

  <li class="MsoNormal" style="">  Programmer Analyst - Text and Speech

Annotation Support (#080725253)<o:p></o:p></li>

</ul>

<p style="margin-bottom: 12pt;"><br>

      Duties: This position will support LDC's

language resource creation projects by providing programming, technical

and

research support in a lead capacity.  Primary responsibilities will be

to

design, develop and implement programming solutions and oversee all

technical

aspects of the projects, work with LDC's project managers, annotators,

programmers, and clients to develop achievable plans for corpus or

software

development and successfully execute them; write annotation tools, data

processing tools, web applications and other software necessary for the

projects; support annotation workflow; support end-users; investigate

technical

issues that may arise during the life cycles of projects, and provide

timely

solutions to them as necessary.<o:p></o:p></p>

<ul type="disc">

  <li class="MsoNormal" style="">   Programmer Analyst - Arabic

Treebank (#080324301)<o:p></o:p></li>

</ul>

<p style="margin-bottom: 12pt;"><br>

      Duties: Same as above; this position will

primarily work on Arabic Treebank and other Arabic-related projects.

(Grammatical knowledge and reading ability of the Arabic language

highly

preferred for this position.)<o:p></o:p></p>

<ul type="disc">

  <li class="MsoNormal" style="">  Programmer Analyst - External

Relations (#080725188)<o:p></o:p></li>

</ul>

<p style="margin-bottom: 12pt;"><br>

      Duties: This position will support LDC's External

Relations Group by designing. developing, coding and providing support

for

LDC's business systems. The business systems support the organization's

membership and sales activities and time tracking; features include

invoicing,

member tracking and reporting functions.  This position will also

coordinate and prepare publications of language resources -- such as

video

computer-readable speech, and software and text data -- used for<br>

human language technology research and technology development.<br>

<br>

For further information on the duties and qualifications for these

positions,

or to apply online please visit <a href="http://jobs.hr.upenn.edu/">http://jobs.hr.upenn.edu/</a>;

search postings for the reference numbers indicated above.<br>

<br>

Penn offers an excellent benefits package including medical/dental,

retirement

plans, tuition assistance and a minimum of 3 weeks paid vacation per

year.

The  <st1:place><st1:placetype>University</st1:placetype> of <st1:placename>Pennsylvania</st1:placename></st1:place>

is an affirmative action/equal opportunity employer.  Positions

contingent

upon funding.<br>

<br>

For more information about LDC and the programs we support, visit <a

 href="http://www.ldc.upenn.edu/">http://www.ldc.upenn.edu/</a>.</p>

<p style="margin-bottom: 12pt;"><br>

<o:p></o:p></p>

<p class="MsoNormal" style="margin-bottom: 12pt;"><br>

<o:p></o:p></p>

<p class="MsoNormal" style="text-align: center;" align="center"><b><span

 style="color: black;">New Publications</span></b><o:p></o:p></p>

<p><o:p> </o:p></p>

<p><span style="color: black;">(1) - (2) Brown Laboratory for

Linguistic

Information Processing (BLLIP) </span>contains a Penn Treebank-style

parsing of

text from the <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T21">North

American News Text Corpus (LDC95T21)</a>. The North American News Text

Corpus

consists of English news text from the Los Angeles Times-Washington

Post

(1994-1997), the New York Times (1994-1996), Reuters News Service

(1994-1996)

and the Wall Street Journal (1994-1996).<o:p></o:p></p>

<p>BLLIP North American News Text release is available as two versions:

<a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T13">BLLIP

North American News Text, Complete (LDC2008T13)</a>, a Members-Only

corpus that

contain sentences from all sources in The North American News Text

Corpus; and <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T14">BLLIP

North American News Text, General Release (LDC2008T14)</a>, a corpus

available

to nonmembers that does not include the Wall Street Journal data from

The North

American News Text Corpus.<o:p></o:p></p>

<p class="MsoNormal">The data in this release was parsed into Penn

Treebank-style

parse trees using a re-ranking parser developed by Eugene Charniak and

Mark

Johnson. The Charniak and Johnson parser is statistically-based and

uses a

generative first stage followed by a discriminative second stage. Both

stages

were trained on the Wall Street Journal data in <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T7">Treebank-2

(LDC95T7)</a> and <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42">Treebank-3

(LDC99T42)</a>.  <br>

<br>

In order to produce BLLIP North American News Text, the

Charniak-Johnson parser

used a simplified context free grammar in the first stage to generate a

set of <em>n

best</em> parses. Those parses were then pruned by eliminating the

parses at

the edges of the distribution. In the second stage, a maximum

entropy-based

parser using a complete grammar was applied. The output trees are

ranked in

order of probability.  The parses in BLLIP North American News Text

include constituency and POS tagging information for each of the

50-best parses

of each sentence.  Each file contains a sequence of n-best lists. An

n-best list is a list of the top n parses of each sentence with the

corresponding parser probability and re-ranker score.  <br>

</p>

<p style="margin-bottom: 12pt;"><br>

<o:p></o:p></p>

<p style="margin-bottom: 12pt; text-align: center;" align="center"><b>*</b><o:p></o:p></p>

<p><span style="color: black;">(3) - (4) North American News Text</span>

is a

collection of English news text from the Los Angeles Times, Washington

Post,

New York Times, Reuters and the Wall Street Journal. This corpus was

originally

released in 1995 as the <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T21">North

American News Text Corpus (LDC95T21) </a>and is reissued to complement

the

release of the Brown Laboratory for Linguistic Information Processing

(BLLIP)

North American News Text sets (LDC2008T13, LDC2008T14), which consist

of Penn

Treebank-style parsing of that news text. <o:p></o:p></p>

<p>North American News Text is reissued in two versions: <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T15">North

American News Text, Complete (LDC2008T15)</a>, the Members-Only

original

version, now available as a 2008 Membership Year corpus; and <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T16">North

American News Text, General Release (LDC2008T16)</a>, a corpus

available to

nonmembers, which does not include text from the Wall Journal Street

Journal.

The directory structure of each of these publications has been

restructured to

be identical to the directory structure of the BLLIP releases.  The

text

content of each data file (following uncompression with the GNU-unzip

utility)

consists of plain ASCII character data with SGML tags to indicate

article

boundaries and organization of information within each article. <br>

</p>

<br>

<hr size="2" width="100%">

<div align="center"><font face="Courier New, Courier, monospace"><small><small><big><br>

Ilya

Ahtaridis<br>

Membership Coordinator</big><br>

<br>

</small>--------------------------------------------------------------------</small><small><br>

</small></font></div>

<div align="center">

<pre class="moz-signature" cols="72"><font

 face="Courier New, Courier, monospace">Linguistic Data Consortium                     Phone: (215) 573-1275

University of Pennsylvania                       Fax: (215) 573-2175

3600 Market St., Suite 810                         <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

 Philadelphia, PA 19104 USA                   <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></font></pre>

</div>

<br>

</body>

</html>