7.843, FYI: LDC release, Corpus tools, Ph.D. thesis, Kittredge lecture

Fri Jun 7 16:23:59 UTC 1996

---------------------------------------------------------------------------
LINGUIST List:  Vol-7-843. Fri Jun 7 1996. ISSN: 1068-4875. Lines:  226

Subject: 7.843, FYI: LDC release, Corpus tools, Ph.D. thesis, Kittredge lecture

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu> (On Leave)
            T. Daniel Seely: Eastern Michigan U. <dseely at emunix.emich.edu>

Associate Editor:  Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
                   Ann Dizdar <dizdar at tam2000.tamu.edu>
                   Annemarie Valdez <avaldez at emunix.emich.edu>

Software development: John H. Remmers <remmers at emunix.emich.edu>

Editor for this issue: dizdar at tam2000.tamu.edu (Ann Dizdar)

---------------------------------Directory-----------------------------------
1)
Date:  05 Jun 1996 10:43:30 +0200
From:  antoine.ogonowski at erli.fr ("Antoine Ogonowski")
Subject:  New Release from the LDC

2)
Date:  Mon, 03 Jun 1996 18:48:26 +0200
From:  ebert at anyway.gs.uni-heidelberg.de (Christian Ebert)
Subject:  Survey on Corpus Access Tools

3)
Date:  Wed, 05 Jun 1996 18:16:06 +0200
From:  lager at ling.gu.se (Torbjoern Lager)
Subject:  Announcement: Ph.D. Thesis: Comp. Corpus Linguistics

4)
Date:  Fri, 07 Jun 1996 10:29:31 +0200
From:  Bruno.Tersago at ccl.kuleuven.ac.be (Bruno Tersago)
Subject:  Lecture Richard Kittredge

---------------------------------Messages------------------------------------
1)
Date:  05 Jun 1996 10:43:30 +0200
From:  antoine.ogonowski at erli.fr ("Antoine Ogonowski")
Subject:  New Release from the LDC

De: LDC Office le Ven 31 Mai 1996 6:57 pm
Objet: New Release from the LDC
A: ldc-publicity at unagi.cis.upenn.edu
Cc: ldc at unagi.cis.upenn.edu

                Announcing a NEW RELEASE from the
                   LINGUISTIC DATA CONSORTIUM

            Acoustic-Phonetic Continuous Speech Corpus
                 Far Field Microphone Recordings

                             FFMTIMIT

The FFMTIMIT corpus contains the previously-unreleased secondary
microphone waveforms for the TIMIT Acoustic-Phonetic Continuous Speech
corpus.  The primary microphone waveforms, which were recorded using a
close-talking noise-cancelling head-mounted Sennheiser microphone
(model HMD-414), are available from the LDC on NIST Speech Disc 1-1.1
(LDC93S1).  The secondary microphone used in the recording of the
TIMIT corpus was a Breul & Kjaer 1/2" free-field microphone (model
4165).

While the Sennheiser microphone recordings are relatively "clean" with
respect to non-speech noise, the FFMTIMIT recordings includes
significant low frequency noise, which was due to the HVAC system and
mechanical vibration transmitted through the floor of the
double-walled sound booth used in recording.  Because it is noiser
than its TIMIT counterpart, the data of FFMTIMIT may be used in the
development of more noise-robust speech recognition systems.  In
addition, this data may be of value to researchers involved in vocal
tract modeling because the B&K microphone has extremely flat
free-field frequency response and calibration tones are provided.

Note that the B&K TIMIT data contained with this release has not been
processed through any highpass filter, (e.g., the 1581-point filter
described in the paper "The DARPA Speech Recognition Research
Database" by Fisher, Doddington and Goudie-Marshall in "DARPA TIMIT
Acoustic-Phonetic Continuous Speech Corpus CD-ROM," NISTIR 4930 / NTIS
Order No. PB93- 173938.)

Institutions that have membership in the LDC during the 1996
Membership Year will be able to receive FFMTIMIT at no additional
charge, in the same manner as all other text and speech corpora
published by the LDC.

Nonmembers can receive a copy of FFMTIMIT for research purposes only
for a fee of $100. If you would like to order a copy of this corpus,
please email your request to ldc at unagi.cis.upenn.edu. If you need
additional information before placing your order, or would like to
inquire about membership in the LDC, please send email or call (215)
898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.cis.upenn.edu/~ldc. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.
------------------------------------------------------------------------
2)
Date:  Mon, 03 Jun 1996 18:48:26 +0200
From:  ebert at anyway.gs.uni-heidelberg.de (Christian Ebert)
Subject:  Survey on Corpus Access Tools

As part of a 2 semester software project at the Department of
Computational Linguistics, University of Heidelberg, Germany, we
intend to design and implement a general accessing tool for large text
corpora. In order to investigate the user's needs and wishes
concerning such a tool, we provide the following questionnaire, It is
addressed to anyone doing linguistic work or research using text
corpora. Maybe your future work will benefit from our
development. Therefore, we kindly please you to help us in the design
of such an accessing tool by filling out our questionnaire. Feel free
to make any annotations you regard as useful or important to the
subject (including the questionnaire itself).

Our questionnaire is located at

    http://www.gs.uni-heidelberg.de/~ebert/quest.html

If you have any further questions, don't hesitate to send us a mail:

    swp at novell1.gs.uni-heidelberg.de

Thank you in advance for your cooperation!

    Department of Computational Linguistics
    University of Heidelberg, Germany
    Karlstr. 2
    69125 Heidelberg

------------------------------------------------------------------------
3)
Date:  Wed, 05 Jun 1996 18:16:06 +0200
From:  lager at ling.gu.se (Torbjoern Lager)
Subject:  Announcement: Ph.D. Thesis: Comp. Corpus Linguistics

KEY WORDS: Corpus linguistics, Corpus tools, Grammar, Grammar
development

#### ####  Ph.D. Thesis Announcement
#### ####
#### ####  A LOGICAL APPROACH TO COMPUTATIONAL CORPUS LINGUISTICS
#### ####
#### ####  Torbj=F6rn Lager
        =

This is to announce the availability of my Ph.D. thesis: "A Logical
Approach to Computational Corpus Linguistics". I have prepared a WWW
page dedicated to the approach described in the thesis, from which
machine readable versions of the thesis may be downloaded, and hard
copies ordered. The relevant URL is:

   http://www.ling.gu.se/~lager/taglog.html

You may also send mail directly to me: lager at ling.gu.se

ABSTRACT

The purpose of this thesis is to build a *corpus theory development
environment* -- to discuss its design, use, and implementation. The
proposed system is based on a logical approach to computational corpus
linguistics where sentences of logic are used to express statements
about texts and logical inference is used to manipulate these
sentences in order to analyse the texts.
      The thesis demonstrates the remarkable ease with which the
functionalities needed in a corpus system can be implemented when
based upon adequate means of representing, querying, and
reasoning. The proposed system implements hand coding, searching,
concordancing, parsing, counting, tabling, collocating, automatic
part-of-speech tagging, lemmatizing, excerpting, interpreting,
treebanking, explanation, and various kinds of learning.
      By linking all this functionality into a common representational
framework characterised by high expressive power, declarativity, and
explicit reasoning strategies, and by embedding the whole concept in a
particular philosophical and methodological context, including an
ontology of text, an analysis of the notion of theory, an explication
of the notion of truth, and other foundational issues, we arrive at an
interactive system which is multi-functional and general, yet simple,
consistent, and highly usable.
      Apart from being interesting from a practical point of view, the
development of such a system raises intriguing philosophical and
methodological questions: What is a corpus text? What is a corpus
theory?  What does it mean to develop a corpus theory? What does it
mean for a corpus theory to be true about a corpus text? What is the
link between the truth of such a theory and its usefulness for natural
language processing purposes? These and related questions are discussed
in the thesis.
      The system exists in a prototype implementation and the thesis
contains numerous examples from this implementation in action.

KEY WORDS: Corpus linguistics, Corpus tools, Grammar, Grammar development

Torbjoern Lager                                  E-mail: lager at ling.gu.se
Department of Linguistics                        Phone: +46 31 7731175
University of Gothenburg                         Fax: +46 31 7734853
Renstroemsparken
412 98 Gothenburg
Sweden

------------------------------------------------------------------------
4)
Date:  Fri, 07 Jun 1996 10:29:31 +0200
From:  Bruno.Tersago at ccl.kuleuven.ac.be (Bruno Tersago)
Subject:  Lecture Richard Kittredge

------------------------------------------------------------------------
LINGUIST List: Vol-7-843.