<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<div align="center"><b style=""><span
style="font-size: 12pt; font-family: "Times New Roman";"><br>
</span></b><b style=""><span
style="font-size: 12pt; font-family: "Times New Roman";">
</span></b>LDC2008T03<br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">ACE
2005 English SpatialML Annotations</a> -</b><br>
<br>
LDC2008S01<br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S01">CSLU:
Portland Cellular Telephone Speech Version 1.3</a> -</b><br>
<br>
LDC2008T01<br>
<b>- <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T01">Hungarian-English
Parallel Text, Version 1.0</a> -<br>
<br>
The Linguistic Data Consortium (LDC) is pleased to announce the
availability of three new publications.<br>
</b></div>
<br>
<hr size="2" width="100%"><b></b>
<p class="MsoNormal" style="" align="center"><b>New Publications</b><br>
<br>
</p>
<p>(1) The ACE (Automatic Contact Extraction) program focuses on
developing automatic content extraction technology to support automatic
processing of human language in text form. The kind of information
recognized and extracted from text includes entities, values, temporal
expressions, relations and events. SpatialML is a mark-up language for
representing spatial expressions in natural language documents.
SpatialML's focus is primarily on geography and culturally-relevant
landmarks, rather than biology, cosmology, geology, or other regions of
the spatial language domain. The goal is to allow for potentially
better integration of text collections with resources such as databases
that provide spatial information about a domain, including gazetteers,
physical feature databases and mapping services. In A<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">CE
2005 English SpatialML Annotations</a>, the authors applied SpatialML
tags to the English training data (originally annotated for entities,
relations and events) in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06">ACE
2005 Multilingual Training Corpus, LDC2006T06.</a></p>
<p>The main SpatialML tag is the PLACE tag. The central goal of
SpatialML is to map PLACE information in text to data from gazetteers
and other databases to the extent possible. Therefore, semantic
attributes such as country abbreviations, country subdivision and
dependent area abbreviations (e.g., US states), and geo-coordinates are
used to help establish such a mapping. LINK and PATH tags express
relations between places, such as inclusion relations and trajectories
of various kinds. To the extent possible, SpatialML leverages ISO and
other standards towards the goal of making the scheme compatible with
existing and future corpora. The SpatialML guidelines are compatible
with existing guidelines for spatial annotation and existing corpora
within the ACE research program. <br>
</p>
<br>
<p align="center">*<br>
</p>
<p>(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S01">CSLU:
Portland Cellular Telephone Speech Version 1.3</a> was
created by the Center for Spoken Language Understanding (CSLU) at OGI
School of Science and Engineering, Oregon Health and Science
University, Beaverton, Oregon. It consists of cellular telephone speech
and corresponding transcripts, specifically, 7,571 utterances from 515
speakers who made calls in the Portland, Oregon area using cellular
telephones.</p>
<p>Speakers called the CSLU data collection system on cellular
telephones, and they were asked to repeat certain phrases and to
respond to other prompts. Two prompt protocols were used: an In Vehicle
Protocol for speakers calling from inside a vehicle and a Not in
Vehicle Protocol for those calling from outside a vehicle. The
protocols shared several questions, but each protocol contained
distinct queries designed to probe the conditions of the caller's in
vehicle/not in vehicle surroundings. Not every caller provided a
response to each prompt. </p>
<p>The text transcriptions were produced using the non time-aligned
word-level conventions described in The CSLU Labeling Guide, which is
included in the documentation for this release. The corpus contains
both orthographic and phonetic transcriptions of corresponding speech
files. </p>
<div align="center">*<br>
</div>
<br>
(3) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T01">Hungarian-English
Parallel Text, Version 1.0</a> (also known as the
"Hunglish Corpus") is a sentence-aligned Hungarian-English parallel
corpus consisting of approximately two million sentence pairs. The
corpus contains additional language resources for the Hungarian text,
including a monolingual corpus, morphological toolset and aligner.<span
style="" lang="EN-US"> Hungarian-English Parallel Text, Version 1.0
is a joint work of the <a
href="http://mokk.bme.hu/index_html-en?set_language=en&cl=en">Media
Research and Education Center</a> at the <a href="http://www.bme.hu/en">Budapest
University of Technology and Economics (BUTE)</a> and the <a
href="http://www.nytud.hu/depts/corpus/index.html">Corpus Linguistics
Department</a> at the Hungarian Academy of Sciences <a
href="http://www.nytud.hu/eng/index.html">Institute of Linguistics</a>.
<br>
</span><br>
Sentence pair (.bi) files consist of tab-separated, matching sentence
pairs. The .bi files do not contain segments where deletion or
contraction occurred. They are also filtered based on quality, so the
full reconstruction of the raw texts is impossible. Some .bi files were
shuffled (sorted alphabetically).
<p>Alignment "ladder" (.lad) files preserve the whole of both input
texts with ordering, even those segments that were not successfully
aligned. In .lad files, every line is tab-separated into two columns.
The first is a segment of the Hungarian text. The second is a
(supposedly corresponding) segment of the English text. Such segments
of the source or target text will generally consist of exactly one
sentence on both sides, but can also consist of zero, or more than one,
sentence. <br>
</p>
<br>
<hr size="2" width="100%">
<div align="center"><small><font face="Courier New, Courier, monospace"><br>
Ilya
Ahtaridis</font></small><br>
<small><font face="Courier New, Courier, monospace">
Membership Coordinator</font></small></div>
<p>
</p>
<div align="center">--------------------------------------------------------------------<br>
</div>
<div align="center">
<pre class="moz-signature" cols="72">Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
</body>
</html>