<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi John,<br>
<br>
You might wish to consider the following HUB4 and TDT resources
distributed by the LDC.  These data sets contain substantial quantities
of recent broadcast news in several languages, segmented into
individual stories and time-aligned with verbatim transcripts.
<br>
<br>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC97S66">LDC97S66</a> </td>
      <td>1996 English Broadcast News Dev and Eval (Hub-4)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC97S44">LDC97S44</a> </td>
      <td>1996 English Broadcast News Speech (Hub-4)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC97T22">LDC97T22</a> </td>
      <td>1996 English Broadcast News Transcripts (Hub-4)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98S71">LDC98S71</a> </td>
      <td>1997 English Broadcast News Speech (Hub-4)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98T28">LDC98T28</a> </td>
      <td>1997 English Broadcast News Transcripts (Hub-4)</td>
    </tr>
  </tbody>
</table>
<br>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2002S11">LDC2002S11</a>
      </td>
      <td>1997 HUB4 English Evaluation Speech and Transcripts</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98S73">LDC98S73</a> </td>
      <td>1997 Mandarin Broadcast News Speech (Hub-4NE)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98T24">LDC98T24</a> </td>
      <td>1997 Mandarin Broadcast News Transcripts (Hub-4NE)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98S74">LDC98S74</a> </td>
      <td>1997 Spanish Broadcast News Speech (Hub-4NE)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC98T29">LDC98T29</a> </td>
      <td>1997 Spanish Broadcast News Transcripts (Hub-4NE)</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2000S86">LDC2000S86</a>
      </td>
      <td>1998 HUB-4 Broadcast News Evaluation English Test Material</td>
    </tr>
  </tbody>
</table>
<br>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2000S92">LDC2000S92</a>
      </td>
      <td>TDT2 Careful Transcription Audio</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2000T44">LDC2000T44</a>
      </td>
      <td>TDT2 Careful Transcription Text</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC99S84">LDC99S84</a> </td>
      <td>TDT2 English Audio</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2001S93">LDC2001S93</a>
      </td>
      <td>TDT2 Mandarin Audio Corpus</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2001T57">LDC2001T57</a>
      </td>
      <td>TDT2 Multilanguage Text Version 4.0</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2001S94">LDC2001S94</a>
      </td>
      <td>TDT3 English Audio</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2001S95">LDC2001S95</a>
      </td>
      <td>TDT3 Mandarin Audio</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2001T58">LDC2001T58</a>
      </td>
      <td>TDT3 Multilanguage Text Version 2.0</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2005S11">LDC2005S11</a>
      </td>
      <td>TDT4 Multilingual Broadcast News Speech Corpus</td>
    </tr>
    <tr>
      <td><a href="catalogEntry.jsp?catalogId=LDC2005T16">LDC2005T16</a>
      </td>
      <td>TDT4 Multilingual Text and Annotations</td>
    </tr>
  </tbody>
</table>
<br>
You can view our entire online catalog at:<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu/Catalog/">http://www.ldc.upenn.edu/Catalog/</a><br>
<br>
Kind regards,<br>
<br>
Ilya<br>
<br>
Bryar Family wrote:
<blockquote cite="mid006a01c5e535$21b62620$2302a8c0@DadIGNYTMachine"
 type="cite">
  <pre wrap="">Hello:

I'm developing a project for rapid identification and categorization of
audio news clips, with a "target communities" focus. Are there any public
corpora available that consist of individual audio news stories of recent
vintage? (last 5-10 years)

I'd also be interested in corresponding with any members of the list who are
developing content categorization strategies for such audio content. For
example, if there are any members of the list who are involved with the
NewsML project, I'd like to hear from them. 

John V "Jack" Bryar
Managing Partner and acting CTO,
MilkBottleNews Partners
Direct: 802-843-6033
<a class="moz-txt-link-abbreviated" href="mailto:jack@milkbottlenews.com">jack@milkbottlenews.com</a>

  </pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">-- 


Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium                    Phone: (215) 573-1275
University of Pennsylvania                    Fax:   (215) 573-2175
3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104                     <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</body>
</html>