20.2692, Diss: Comp Ling: Orasan: 'Comparative Evaluation of Modular...'

Wed Aug 5 14:25:31 UTC 2009

LINGUIST List: Vol-20-2692. Wed Aug 05 2009. ISSN: 1068 - 4875.

Subject: 20.2692, Diss: Comp Ling: Orasan: 'Comparative Evaluation of Modular...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Randall Eggert, U of Utah  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Di Wdzenczny <di at linguistlist.org>

To post to LINGUIST, use our convenient web form at


Date: 05-Aug-2009
From: Constantin Orasan < C.Orasan at wlv.ac.uk >
Subject: Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST

-------------------------Message 1 ---------------------------------- 
Date: Wed, 05 Aug 2009 10:23:49
From: Constantin Orasan [C.Orasan at wlv.ac.uk]
Subject: Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST

E-mail this message to a friend:

Institution: University of Wolverhampton 
Program: School of Humanities, Languages and Social Sciences 
Dissertation Status: Completed 
Degree Date: 2006 

Author: Constantin Orasan

Dissertation Title: Comparative Evaluation of Modular Automatic Summarisation
Systems Using CAST 

Dissertation URL:  http://clg.wlv.ac.uk/papers/orasan-thesis.php

Linguistic Field(s): Computational Linguistics

Dissertation Director(s):
Chris Paice
Ruslan Mitkov

Dissertation Abstract:

The information overload faced by today's society poses great challenges to
researchers who want to find a relevant piece of information. Automatic
summarisation is a field of computational linguistics which can help humans
to deal with this information overload by automatically extracting the gist
of documents.

This thesis attempts to gain insights into the automatic summarisation
field from several different angles. First, it performs qualitative,
quantitative and comparative evaluations of different automatic
summarisation methods. These summarisation methods are built around a
term-based summariser which is then augmented with additional linguistic
information which includes lexical, semantic and discourse information. On
the basis of these evaluations, it was noticed that the choice of modules
which provide low-level linguistic information (e.g. morphological
processors) does not influence the results significantly, but higher level
linguistic information, such as anaphora resolution and shallow information
about discourse structure, leads to significant improvements of the summaries.

In order to have a comprehensive view of how good summaries produced by a
given method are, the evaluation performed in this thesis measures both the
informativeness of the summaries produced and the quality of their
discourse structure. Moreover, a method which determines the upper limit
for informativeness is proposed to demonstrate the limits of extraction
techniques. Comparison between the informativeness and the quality of
discourse reveals no correlation between them.

A third direction pursued in this research is to replace conventional
iterative extraction methods, which extract one sentence at a time without
considering the rest of the sentences in the summary, with more holistic
ones, where the decision to extract a sentence is determined not only by
the content of a sentence, but also by the rest of the sentences extracted.
To this end, a genetic algorithm which encodes the whole summary is
implemented and is shown to produce better summaries than its iterative

LINGUIST List: Vol-20-2692	


More information about the Linguist mailing list