<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <hr size="2" width="100%">
    <div align="center"><tt><i style="mso-bidi-font-style:normal"><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">In
            this newsletter:</span></i></tt><br>
      <tt><i style="mso-bidi-font-style:normal"><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></i><b><i
            style="mso-bidi-font-style:normal"><span
              style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></i></b>
      </tt><br>
      <b><tt><span
style="font-size:11.0pt;mso-fareast-font-family:Calibri;mso-bidi-font-weight:bold"><span
              style="mso-list:Ignore">-<span style="font:7.0pt
                "Times New Roman"">  </span></span></span><a
            href="#google">LDC and Google Collaboration Results in New
            Syntactically-Annotated Language Resources</a><span
            style="font-size:11.0pt;mso-ascii-font-family:
            Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
            mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:






            minor-latin"><span style="mso-spacerun:yes">  -</span></span></tt></b><br>
      <b> <tt><span style="font-size:11.0pt;mso-ascii-font-family:
            Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
            mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:






            minor-latin"><o:p></o:p></span> </tt></b><br>
      <b> <tt><span
style="font-size:11.0pt;mso-fareast-font-family:Calibri;mso-bidi-font-weight:bold"><span
              style="mso-list:Ignore"><span style="font:7.0pt
                "Times New Roman""></span></span></span><span
            style="font-size:11.0pt;mso-ascii-font-family:
            Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
            mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:






            minor-latin"><span style="mso-spacerun:yes"> </span>-  </span><a
            href="#20th">The Future of Language Resources: LDC 20th
            Anniversary Workshop</a><span style="mso-spacerun:yes"></span><span
            style="font-size:11.0pt;mso-ascii-font-family:
            Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
            mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:






            minor-latin"><span style="mso-spacerun:yes">  -</span></span></tt></b><br>
      <b> <tt><span style="font-size:11.0pt;mso-ascii-font-family:
            Calibri;mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;
            mso-hansi-theme-font:minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:






            minor-latin"><o:p></o:p></span> </tt></b><br>
      <b> <tt><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">-  </span><a href="#scholar">Fall 2012 LDC Data
            Scholarship Program</a><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">  -</span></tt></b><br>
      <b> <tt><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p></o:p></span> </tt></b><br>
      <b> <tt><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p></o:p></span> </tt></b><br>
      <tt><i style="mso-bidi-font-style:normal"><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">New






            publications:</span></i></tt><br>
      <tt><i style="mso-bidi-font-style:normal"><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></i><b><i
            style="mso-bidi-font-style:normal"><span
              style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span></i></b>
      </tt><br>
      <tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin">LDC2012T13</span></tt><br>
      <tt><b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">-  </span></b><b><a href="#webtb">English Web
            Treebank</a><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">  -</span></b></tt><br>
      <tt><b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p></o:p></span></b> </tt><br>
      <tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p> </o:p>LDC2012T14</span></tt><br>
      <b><tt><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">-  </span><a href="#gale">GALE Phase 2 Arabic
            Broadcast Conversation Parallel Text Part 2</a><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">   –</span></tt></b><br>
      <tt><b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p></o:p></span></b> </tt><br>
      <tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p> </o:p>LDC2012T12</span></tt><br>
      <b><tt><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">-  </span><a href="#time">Spanish TimeBank 1.0</a><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">  –</span><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p> </o:p></span></tt></b><br>
      <tt><b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p></o:p></span></b></tt><tt><b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin"><o:p> </o:p></span></b></tt>
    </div>
    <tt> <b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p> </o:p></span></b></tt><tt><b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p></o:p></span></b> </tt><br>
    <hr size="2" width="100%"> <tt><b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><br>
        </span></b><b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><br>
        </span></b></tt>
    <div align="center"><tt><a name="google"></a> <b><span
            style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
            minor-latin">LDC and Google Collaboration Results in New
            Syntactically-Annotated Language Resources</span></b></tt><br>
    </div>
    <tt><b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"></span></b> <b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p></o:p></span></b> </tt><tt><br>
      <span style="mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">Google Inc.<span
          style="mso-spacerun:yes">  </span>and the Linguistic Data
        Consortium (LDC) have collaborated to develop new
        syntactically-annotated language resources that enable computers
        to better understand human language. The project, funded<b
          style="mso-bidi-font-weight:normal"> </b>through a gift from
        Google in 2010, has resulted in the development of the <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T13">English
Web



          Treebank LDC2012T13</a> containing over 250,000 words of
        weblogs, newsgroups, email, reviews and question-answers
        manually annotated for syntactic structure. This resource will
        allow language technology researchers to develop and evaluate
        the robustness of parsing methods in various new web domains. It
        was used in the 2012 shared task on parsing English web text for
        the <a href="https://sites.google.com/site/sancl2012/">First
          Workshop on Syntactic Analysis of Non-Canonical Language
          (SANCL)</a> which took place at NAACL-HLT in Montreal on June
        8, 2012. The English Web Treebank is available to the research
        community through <a href="http://www.ldc.upenn.edu/Catalog/">LDC’s



          Catalog</a>.<br>
        <o:p></o:p></span> <br>
      <span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
        minor-latin">Natural language processing (NLP) is a field of
        computational linguistic research concerned with the
        interactions between human language and computers. Parsing is a
        discipline within NLP in which computers analyze text and
        determine its syntactic structure. While syntactic parsing is
        already practically useful, Google funded this effort to help
        the research community develop better parsers for web text. The
        web texts collected and annotated by LDC provide new, diverse
        data for training parsing systems. <br>
        <o:p></o:p></span> <br>
      <span style="mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">Google chose LDC for
        this work based on the Consortium’s experience in developing and
        creating syntactic annotations, also known as treebanks.
        Treebanks are critically important to parsing research since
        they provide human-analyzed sentence structures that facilitate
        training and testing scenarios in NLP research. This work
        extends the existing relationship between LDC and Google.<span
          style="mso-spacerun:yes">  </span>LDC has published four
        other Google-developed data sets in the past six years: English,
        Chinese, Japanese and European language n-grams used principally
        for language modeling. <o:p></o:p></span> <br>
      <b><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p></o:p></span></b></tt><tt><b><span
          style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
          minor-latin"><o:p> </o:p></span></b> <br>
      <span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p></o:p></span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20221#top"></a></tt><tt><span
        style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
    </tt><br>
    <div align="center"> <tt><span
          style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><span
            style="mso-spacerun:yes"></span></span> <a name="20th"></a><span
          style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><b>The



            Future of Language Resources: LDC 20th Anniversary Workshop
            <o:p></o:p></b></span> </tt><br>
      <tt><span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
      </tt><br>
    </div>
    <tt><span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">LDC’s
20th



        Anniversary Workshop is rapidly approaching! The event will take
        place on the University of Pennsylvania’s campus on September
        6-7, 2012. <o:p></o:p></span> <br>
      <span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
      <br>
      <span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Workshop
themes



        include: the developments in human language technologies and
        associated resources that have brought us to our current state;
        the language resources required by the technical approaches
        taken and the impact of these resources on HLT progress; the
        applications of HLT and resources to other disciplines including
        law, medicine, economics, the political sciences and psychology;
        the impact of HLTs and related technologies on linguistic
        analysis and novel approaches in fields as widespread as
        phonetics, semantics, language documentation, sociolinguistics
        and dialect geography; and the impact of any of these
        developments on the ways in which language resources are
        created, shared and exploited and on the specific resources
        required. <o:p></o:p></span> <br>
      <span style="font-size:11.0pt;mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;mso-hansi-font-family:Calibri;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><o:p> </o:p></span>
      <br>
      <span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
        minor-latin"></span></tt><tt><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
        minor-latin">Please read more <a
          href="http://www.ldc.upenn.edu/About/20th_Anniversary_Workshop.html">here</a>.<br>
        <br>
      </span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20221#top">
      </a></tt><br>
    <tt><span style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
        minor-latin"> </span></tt><tt><br>
      <br>
    </tt>
    <div align="center"><tt><a name="scholar"><b><span
              style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin">Fall
2012





              LDC Data Scholarship Program</span></b></a> <b><span
            style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"></span></b><span
          style="mso-bookmark:data"></span><span
          style="mso-fareast-font-family:"Times New
          Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin"> <o:p></o:p></span> </tt><br>
      <br>
    </div>
    <tt><span style="mso-fareast-font-family:"Times New
        Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Applications
are



        now being accepted through September 17, 2012, 11:59PM EST for
        the Fall 2012 LDC Data Scholarship program! The LDC Data
        Scholarship program provides university students with access to
        LDC data at no-cost. During previous program cycles, LDC has
        awarded no-cost copies of LDC data to over 20 individual
        students and student research groups.<br>
        <br>
        This program is open to students pursuing both undergraduate and
        graduate studies in an accredited college or university. LDC
        Data Scholarships are not restricted to any particular field of
        study; however, students must demonstrate a well-developed
        research agenda and a bona fide inability to pay. The selection
        process is highly competitive. <br>
        <br>
        The application consists of two parts: <br>
        <br>
      </span><span style="mso-fareast-font-family:"Times New
        Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        (1) <b>Data Use Proposal</b>. Applicants must submit a proposal
        describing their intended use of the data. The proposal should
        state which data the student plans to use and how the data will
        benefit their research project as well as information on the
        proposed methodology or algorithm.</span><br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
      </span><br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        Applicants should consult the </span> <span
        style="mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><a
          href="http://www.ldc.upenn.edu/Catalog/index.jsp"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New
            Roman";color:#0000CC">LDC Corpus Catalog</span></a></span><span
        style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        for a complete list of data distributed by LDC. Due to certain
        restrictions, a handful of LDC corpora are restricted to members
        of the Consortium. Applicants are advised to select a maximum of
        one to two datasets; students may apply for additional datasets
        during the following cycle once they have completed processing
        of the initial datasets and publish or present work in some
        juried venue.</span><br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
      </span><br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        (2) <b>Letter of Support</b>. Applicants must submit one letter
        of support from their thesis adviser or department chair. The
        letter must confirm that the department or university lacks the
        funding to pay the full Non-member Fee for the data and verify
        the student's need for data.</span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        <br>
        For further information on application materials and program
        rules, please visit the </span><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:
        minor-latin"><a
          href="http://www.ldc.upenn.edu/About/scholarships.html"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">LDC Data Scholarship</span></a></span><span
        style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
        page. <br>
        <br>
        Students can email their applications to the </span><span
        style="mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><a
          href="mailto:datascholarships@ldc.upenn.edu"><span
            style="mso-fareast-font-family: "Times New
            Roman";color:#0000CC">LDC Data Scholarship program</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black">. Decisions will be
        sent by email from the same address.<br>
        <br>
        The deadline for the Fall 2012 program cycle is September 17,
        2012, 11:59PM EST.<br>
        <o:p></o:p></span> </tt><br>
    <br>
    <tt><b><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></b></tt><tt><span
        style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><br>
        <br>
      </span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20219#top">
      </a></tt><tt><b><span style="mso-fareast-font-family: "Times
          New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><br>
        </span></b></tt>
    <div align="right">
      <div align="center"><tt><b><span style="mso-fareast-font-family:
              "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">
              New publications</span></b></tt><br>
        <br>
      </div>
      <tt><b><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></b></tt></div>
    <tt><span style="mso-fareast-font-family: "Times New
Roman";mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><o:p></o:p></span>
      <br>
      <a name="webtb"></a> <span style="mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin">(1)<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T13">English


          Web Treebank</a> was developed by the Linguistic Data
        Consortium (LDC) with funding through a gift from Google Inc. It
        consists of over 250,000 words of English weblogs, newsgroups,
        email, reviews and question-answers manually annotated for
        syntactic structure and is designed to allow language technology
        researchers to develop and evaluate the robustness of parsing
        methods in those web domains. <br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">This
release



        contains 254,830 word-level tokens and 16,624 sentence-level
        tokens of webtext in 1174 files annotated for sentence- and
        word-level tokenization, part-of-speech, and syntactic
        structure. The data is roughly evenly divided across five
        genres: weblogs, newsgroups, email, reviews, and
        question-answers. The files were manually annotated following
        the sentence-level tokenization guidelines for web text and the
        word-level tokenization guidelines developed for English
        treebanks in the </span><span
        style="mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><a
          href="http://projects.ldc.upenn.edu/gale/index.html"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New
            Roman";color:#0000CC">DARPA GALE</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> project. Only text
        from the subject line and message body of posts, articles,
        messages and question-answers were collected and annotated.<br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span><span
        style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Non-members
        may license this data by completing the </span><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">LDC User Agreement for Non-members</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black">. The agreement can
        be faxed to +1 215 573 2175 or scanned and emailed to this
        address. The first fifty copies of this publication are being
        made available at no charge. After the first fifty copies are
        distributed, the non-member fee of US$175 applies.<br>
        <br>
      </span><a
href="imap://ldc@mail.ldc.upenn.edu:993/fetch%3EUID%3E/INBOX%3E20219#top">
      </a><br>
    </tt><tt><span style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> </span><br>
    </tt>
    <div align="center"><tt><span
          style="mso-fareast-font-family:"Times New
          Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin;color:black">*</span></tt><br>
      <tt><span style="mso-fareast-font-family:"Times New
          Roman";mso-bidi-font-family:Calibri;
          mso-bidi-theme-font:minor-latin;color:black"></span></tt></div>
    <tt><span style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> </span><br>
      <a name="gale"></a> <span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black">(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T14">GALE


          Phase 2 Arabic Broadcast Conversation Parallel Text Part 2</a>
        was developed by LDC. Along with other corpora, the parallel
        text in this release comprised training data for Phase 2 of the
        DARPA GALE (Global Autonomous Language Exploitation) Program.
        This corpus contains Modern Standard Arabic source text and
        corresponding English translations selected from broadcast
        conversation (BC) data collected by LDC between 2004 and 2007
        and transcribed by LDC or under its direction. <br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">GALE
Phase



        2 Arabic Broadcast Conversation Parallel Text Part 2 includes 29
        source-translation document pairs, comprising 169,488 words of
        Arabic source text and its English translation. Data is drawn
        from eight distinct Arabic programs broadcast between 2004 and
        2007 from Aljazeera, a regional broadcast programmer based in
        Doha, Qatar; and Nile TV, an Egyptian broadcaster. The programs
        in this release focus on current events topics.<br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">The
files



        in this release were transcribed by LDC staff and/or
        transcription vendors under contract to LDC in accordance with
        the </span><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTransQRTR.V2.pdf"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">Quick Rich Transcription</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> guidelines
        developed by LDC. Transcribers indicated sentence boundaries in
        addition to transcribing the text. Data was manually selected
        for translation according to several criteria, including
        linguistic features, transcription features and topic features.
        The transcribed and segmented files were then reformatted into a
        human-readable translation format and assigned to translation
        vendors. Translators followed LDC's Arabic to English
        translation guidelines. Bilingual LDC staff performed quality
        control procedures in the completed translations. <br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span></tt><br>
    <br>
    <div align="center">*<br>
    </div>
    <tt><span style="mso-fareast-font-family:"Times New
        Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"><o:p></o:p></span>
      <br>
      <a name="time"></a> <span
        style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">(3)


        <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T12">Spanish



          TimeBank 1.0</a> was developed by researchers at </span><span
style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
          href="http://www.barcelonamedia.org/" target="_blank"><span
            style="mso-fareast-font-family: "Times New
            Roman";color:#0000CC">Barcelona Media</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> and consists of
        Spanish texts in the </span><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
          href="http://clic.ub.edu/corpus/en/ancora" target="_blank"><span
            style="mso-fareast-font-family:"Times New
            Roman";color:#0000CC">AnCora corpus</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black"> annotated with
        temporal and event information according to the </span><span
        style="mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin"><a
          href="http://www.timeml.org/site/index.html" target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">TimeML specification language</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black">.<br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Spanish
TimeBank



        1.0 contains stand-off annotations for 210 documents with over
        75,800 tokens (including punctuation marks) and 68,000 tokens
        (excluding punctuation). The source documents are news stories
        and fiction from the AnCora corpus.<br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">The
AnCora



        corpus is the largest multilayer annotated corpus of Spanish and
        Catalan. AnCora contains 400,000 words in Spanish and 275,000
        words in Catalan. The AnCora documents are annotated on many
        linguistic levels including structure, syntax, dependencies,
        semantics and pragmatics. That information is not included in
        this release, but it can be mapped to the present annotations.
        The corpus is freely available from the </span><span
        style="mso-bidi-font-family:
        Calibri;mso-bidi-theme-font:minor-latin"><a
          href="http://clic.ub.edu/ancora" target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">Centre de Llenguatge i Computació (CLiC)</span></a></span><span
        style="mso-fareast-font-family:"Times New
        Roman";mso-bidi-font-family:Calibri;
        mso-bidi-theme-font:minor-latin;color:black">.<br>
        <o:p></o:p></span> <br>
      <span style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black"></span><span
        style="mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin;color:black">Non-members
        may license this data by completing the </span><span
        style="mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin"><a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf"
          target="_blank"><span
            style="mso-fareast-font-family:"Times New Roman";
            color:#0000CC">LDC User Agreement for Non-members</span></a></span></tt><span
      style="mso-fareast-font-family:"Times New
      Roman";mso-bidi-font-family:Calibri;
      mso-bidi-theme-font:minor-latin;color:black"><tt>. The agreement
        can be faxed to +1 215 573 2175 or scanned and emailed to this
        address. The publication is being made available at no charge.<br>
      </tt></span>
    <hr size="2" width="100%"><br>
    <pre class="moz-signature" cols="72"><tt>
--

Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------
Linguistic Data Consortium      Phone: 1 (215) 573-1275
University of Pennsylvania        Fax: 1 (215) 573-2175
3600 Market St., Suite 810            <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA     <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>


</tt></pre>
    <hr size="2" width="100%">
  </body>
</html>