18.2645, Diss: Computational Ling/Text & Corpus Ling: Hasler: 'From Extracts...'

Tue Sep 11 17:52:13 UTC 2007

LINGUIST List: Vol-18-2645. Tue Sep 11 2007. ISSN: 1068 - 4875.

Subject: 18.2645, Diss: Computational Ling/Text & Corpus Ling: Hasler: 'From Extracts...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hunter Lockwood <hunter at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 10-Sep-2007
From: Laura Hasler < L.Hasler at wlv.ac.uk >
Subject: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation

-------------------------Message 1 ---------------------------------- 
Date: Tue, 11 Sep 2007 13:50:43
From: Laura Hasler [L.Hasler at wlv.ac.uk]
Subject: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-2645.html&submissionid=155862&topicid=14&msgnumber=1  

Institution: University of Wolverhampton 
Program: School of Humanities, Languages and Social Sciences 
Dissertation Status: Completed 
Degree Date: 2007 

Author: Laura Hasler

Dissertation Title: From Extracts to Abstracts: Human summary production
operations for computer-aided summarisation 

Dissertation URL:  http://clg.wlv.ac.uk/papers/hasler-thesis.pdf

Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics

Dissertation Director(s):
Michael Hoey
Ruslan Mitkov
Constantin Orasan

Dissertation Abstract:

This thesis is concerned with the field of computer-aided summarisation,
which has emerged at the confluence of the separate but related fields of
human and automatic summarisation. Due to the poor quality of the
readability and coherence of automatically produced extracts,
computer-aided summarisation (CAS) is a viable working option to fully
automatic summarisation. CAS allows a human summariser to post-edit
automatically produced extracts to improve their readability and coherence.
In order to best utilise the concept of computer-aided summarisation,
reliable ways of improving the coherence and readability of extracts when
transforming them into abstracts must be established.

To achieve this, a corpus-based analysis of the operations a human
summariser applies to extracts to transform them into abstracts is
presented. The corpus developed here is a corpus of pairs of news texts
annotated for important information (i.e., human-produced extracts) and the
human-produced abstracts corresponding to these extracts. The creation of
this corpus simulates the computer-aided summarisation process to enable a
reliable investigation into the operations used. A detailed classification
of human summary production operations is proposed, with examples which
highlight the common linguistic realisations and functions of the
operations identified in the corpus. The classification is then used as a
basis for guidelines which can be given to users of computer-aided
summarisation systems in order to ensure that the summaries they produce
are of a consistently high quality.

The human summary production operations are applied to extracts using the
guidelines in order to evaluate them. Evaluation is performed using a
metric developed for Centering Theory, a discourse theory of local
coherence and salience, which constitutes a new evaluation method. This is
appropriate because existing methods of evaluating summaries are
unsuitable. A set of both automatic and human-produced extracts and their
corresponding abstracts are evaluated, and a comparison is made with
evaluations given by a human judge. The evaluation shows that when the
operations are applied to extracts using the guidelines, there is an
improvement in the readability and coherence of the resulting abstracts. 

-----------------------------------------------------------
LINGUIST List: Vol-18-2645