<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

<div align="center"><b style=""><span

 style="font-size: 12pt; font-family: "Times New Roman";"><br>

</span></b><b style=""><span

 style="font-size: 12pt; font-family: "Times New Roman";">

</span></b>LDC2008T03<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">ACE

2005 English SpatialML Annotations</a>  -</b><br>

<br>

LDC2008S01<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S01">CSLU:

Portland Cellular Telephone Speech Version 1.3</a>  -</b><br>

<br>

LDC2008T01<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T01">Hungarian-English

Parallel Text, Version 1.0</a>  -<br>

<br>

The Linguistic Data Consortium (LDC) is pleased to announce the

availability of three new publications.<br>

</b></div>

<br>

<hr size="2" width="100%"><b></b>

<p class="MsoNormal" style="" align="center"><b>New Publications</b><br>

<br>

</p>

<p>(1)  The ACE (Automatic Contact Extraction) program focuses on

developing automatic content extraction technology to support automatic

processing of human language in text form. The kind of information

recognized and extracted from text includes entities, values, temporal

expressions, relations and events. SpatialML is a mark-up language for

representing spatial expressions in natural language documents.

SpatialML's focus is primarily on geography and culturally-relevant

landmarks, rather than biology, cosmology, geology, or other regions of

the spatial language domain. The goal is to allow for potentially

better integration of text collections with resources such as databases

that provide spatial information about a domain, including gazetteers,

physical feature databases and mapping services. In A<a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">CE

2005 English SpatialML Annotations</a>, the authors applied SpatialML

tags to the English training data (originally annotated for entities,

relations and events) in <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06">ACE

2005 Multilingual Training Corpus, LDC2006T06.</a></p>

<p>The main SpatialML tag is the PLACE tag. The central goal of

SpatialML is to map PLACE information in text to data from gazetteers

and other databases to the extent possible. Therefore, semantic

attributes such as country abbreviations, country subdivision and

dependent area abbreviations (e.g., US states), and geo-coordinates are

used to help establish such a mapping. LINK and PATH tags express

relations between places, such as inclusion relations and trajectories

of various kinds. To the extent possible, SpatialML leverages ISO and

other standards towards the goal of making the scheme compatible with

existing and future corpora. The SpatialML guidelines are compatible

with existing guidelines for spatial annotation and existing corpora

within the ACE research program. <br>

</p>

<br>

<p align="center">*<br>

</p>

<p>(2)  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S01">CSLU:

Portland Cellular Telephone Speech Version 1.3</a> was

created by the Center for Spoken Language Understanding (CSLU) at OGI

School of Science and Engineering, Oregon Health and Science

University, Beaverton, Oregon. It consists of cellular telephone speech

and corresponding transcripts, specifically, 7,571 utterances from 515

speakers who made calls in the Portland, Oregon area using cellular

telephones.</p>

<p>Speakers called the CSLU data collection system on cellular

telephones, and they were asked to repeat certain phrases and to

respond to other prompts. Two prompt protocols were used: an In Vehicle

Protocol for speakers calling from inside a vehicle and a Not in

Vehicle Protocol for those calling from outside a vehicle. The

protocols shared several questions, but each protocol contained

distinct queries designed to probe the conditions of the caller's in

vehicle/not in vehicle surroundings. Not every caller provided a

response to each prompt. </p>

<p>The text transcriptions were produced using the non time-aligned

word-level conventions described in The CSLU Labeling Guide, which is

included in the documentation for this release. The corpus contains

both orthographic and phonetic transcriptions of corresponding speech

files. </p>

<div align="center">*<br>

</div>

<br>

(3)  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T01">Hungarian-English

Parallel Text, Version 1.0</a> (also known as the

"Hunglish Corpus") is a sentence-aligned Hungarian-English parallel

corpus consisting of approximately two million sentence pairs. The

corpus contains additional language resources for the Hungarian text,

including a monolingual corpus, morphological toolset and aligner.<span

 style="" lang="EN-US">  Hungarian-English Parallel Text, Version 1.0

is a joint work of the <a

 href="http://mokk.bme.hu/index_html-en?set_language=en&cl=en">Media

Research and Education Center</a> at the <a href="http://www.bme.hu/en">Budapest

University of Technology and Economics (BUTE)</a> and the <a

 href="http://www.nytud.hu/depts/corpus/index.html">Corpus Linguistics

Department</a> at the Hungarian Academy of Sciences <a

 href="http://www.nytud.hu/eng/index.html">Institute of Linguistics</a>.

<br>

</span><br>

Sentence pair (.bi) files consist of tab-separated, matching sentence

pairs. The .bi files do not contain segments where deletion or

contraction occurred. They are also filtered based on quality, so the

full reconstruction of the raw texts is impossible. Some .bi files were

shuffled (sorted alphabetically).

<p>Alignment "ladder" (.lad) files preserve the whole of both input

texts with ordering, even those segments that were not successfully

aligned. In .lad files, every line is tab-separated into two columns.

The first is a segment of the Hungarian text. The second is a

(supposedly corresponding) segment of the English text. Such segments

of the source or target text will generally consist of exactly one

sentence on both sides, but can also consist of zero, or more than one,

sentence. <br>

</p>

<br>

<hr size="2" width="100%">

<div align="center"><small><font face="Courier New, Courier, monospace"><br>

Ilya

Ahtaridis</font></small><br>

<small><font face="Courier New, Courier, monospace">

Membership Coordinator</font></small></div>

<p>

</p>

<div align="center">--------------------------------------------------------------------<br>

</div>

<div align="center">

<pre class="moz-signature" cols="72">Linguistic Data Consortium                     Phone: (215) 573-1275

University of Pennsylvania                       Fax: (215) 573-2175

3600 Market St., Suite 810                         <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                  <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

</div>

</body>

</html>