15.1419, Diss: Computational Ling: Kan: 'Automatic text...'

LINGUIST List linguist at linguistlist.org
Wed May 5 14:42:01 UTC 2004


LINGUIST List:  Vol-15-1419. Wed May 5 2004. ISSN: 1068-4875.

Subject: 15.1419, Diss: Computational Ling: Kan: 'Automatic text...'

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Tomoko Okuno <tomoko at linguistlist.org>
 ==========================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  Tue, 4 May 2004 21:28:28 -0400 (EDT)
From:  kanmy at comp.nus.edu.sg
Subject:  Automatic text summarization as applied to information retrieval...

-------------------------------- Message 1 -------------------------------

Date:  Tue, 4 May 2004 21:28:28 -0400 (EDT)
From:  kanmy at comp.nus.edu.sg
Subject:  Automatic text summarization as applied to information retrieval...



Institution: Columbia University
Program: Natural Language Group Department of Computer Science
Dissertation Status: Completed
Degree Date: 2002

Author: Min-Yen Kan

Dissertation Title:
Automatic text summarization as applied to information retrieval:
Using indicative and informative

Dissertation URL: http://www.comp.nus.edu.sg/~kanmy/papers/thesis.pdf

Linguistic Field: Computational Linguistics

Dissertation Director 1: Kathleen R. McKeown
Dissertation Director 2: Judith L. Klavans


Dissertation Abstract:

I identify weaknesses with the standard "ranked list of documents"
information retrieval user interface by examining the search process
as performed in the traditional library by professional librarians and
catalogers.  I distill these processes into a list of core strategies
which can be effectively fulfilled by multidocument summaries which
assist in both the searching and browsing process.  This thesis
implements such automatic text summarization components to create an
alternative method of presenting search results coming from IR
frameworks.

As a post-processor of results coming from a search framework,
Centrifuser implements these principles by producing both informative
and indicative summaries that aid the user in information seeking
tasks.  Centrifuser uses novel techniques in analyzing source articles
as a nested tree of topics, which allows the system to compare and
contrast discussions of common topics across documents, and to
identify rare topics.  Documents similar in topic distribution are
grouped together to enable faster and more accurate relevance
judgment.

A novel contribution in Centrifuser is the focus on generating
indicative summaries.  I analyze two sources of indicative summaries
-- online public access catalog summaries as well as annotated
bibliography entries -- by examining guidelines for writing such
summaries and by cataloging types of information used in actual
summary corpora. The study reveals that metadata, such as the purpose
or audience of a resource, are important inclusions in indicative
summaries.  By using the study's results, I derive an algorithm that
enables Centrifuser to author indicative summaries that both utilize
and include metadata, a novel contribution in the summarization field.

To enhance the quality and the variety of summaries that are produced,
I have employed novel techniques in natural language generation.  The
system analyzes documents using a two-part method: high-level content
planning deduces what semantic predicates to include and where to
place them, and a low-level realization model computes the most
appropriate phrasing for each predicate using both local as well as
global context.

---------------------------------------------------------------------------
LINGUIST List: Vol-15-1419



More information about the LINGUIST mailing list