12.3106, Review: Recent Advances in Computational Terminology

LINGUIST List linguist at linguistlist.org
Sun Dec 16 22:45:54 UTC 2001


LINGUIST List:  Vol-12-3106. Sun Dec 16 2001. ISSN: 1068-4875.

Subject: 12.3106, Review: Recent Advances in Computational Terminology

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Jody Huellmantel, WSU		James Yuells, WSU
	Michael Appleby, EMU		Marie Klopfenstein, WSU
	Ljuba Veselinova, Stockholm U.	Heather Taylor-Loring, EMU
	Dina Kapetangianni, EMU		Richard Harvey, EMU
	Karolina Owczarzak, EMU		Renee Galvis, WSU

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.



Editor for this issue: Terence Langendoen <terry at linguistlist.org>
 ==========================================================================
What follows is another discussion note contributed to our Book
Discussion Forum.  We expect these discussions to be informal and
interactive; and the author of the book discussed is cordially invited
to join in.

If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for discussion."  (This means that
the publisher has sent us a review copy.)  Then contact
Simin Karimi at simin at linguistlist.org
or Terry Langendoen at terry at linguistlist.org.

Subscribe to Blackwell's LL+ at http://www.linguistlistplus.com/ and
donate 20% of your subscription to LINGUIST! You get 30% off on
Blackwells books, and free shipping and postage!


=================================Directory=================================

1)
Date:  Sun, 16 Dec 2001 21:28:28 +0100
From:  Federica Da Milano <chiccadm at tin.it>
Subject:  Review of Bourigault et al, Recent Advances in Computational
         Technology

-------------------------------- Message 1 -------------------------------

Date:  Sun, 16 Dec 2001 21:28:28 +0100
From:  Federica Da Milano <chiccadm at tin.it>
Subject:  Review of Bourigault et al, Recent Advances in Computational
         Technology

Bourigault, Didier, Christian Jacquemin, and Marie-Claude L'Homme, ed.
(2001) Recent Advances in Computational Terminology. John Benjamins
Publishing Company, xvii+375pp, hardback ISBN 1-58811-016-8, $99.00,
Natural Language Processing 2.

Federica Da Milano, Department of Linguistics, University of Pavia,
Italy

INTRODUCTION
The book is an edited collection of 17 articles from researchers in
automatic analysis, storage, and use of terminology, and specialists in
applied linguistics, computational linguistics, information retrieval,
and artificial intelligence.

The book follows the First Workshop on Computational Terminology which
took place at COLING-ACL'98. The goal of the workshop was to bring
researchers from different scientific communities together, leading to
the recognition of a field that can now be called 'computational
terminology'. The book contains the extended and revised versions of
the papers published in the proceedings. The contributions reflect the
innovative and fruitful advances in computational terminology at the
crossroads of terminology, linguistics and computer science. The
contributions show a wide range of fields for which computational
terminology tools are developed. The following applications are covered
in the book: information retrieval, the building of bilingual lexicons,
terminography, and automatic abstracting. The book reflects that there
are a few similarities and many differences between the techniques and
approaches computational terminologists make use of in order to improve
term extraction or to assist terminology-related applications.

SUMMARY
1. A graph-based approach to the automatic generation of multilingual
keyword clusters (Akiko Aizawa and Kyo Kageura)

The authors introduce a graph-based approach to the automatic
generation of Japanese and English bilingual keyword clusters using the
keyword lists assigned to academic papers by the authors where each of
the generated clusters contains keywords with similar meanings from
both languages. After a description of  the methodology of the
extraction of the data, the authors also provide an useful overview of
the state of the art in their topic, and situate their work within the
background of current research. The major advantages of this
methodology are that, unlike statistical methods, clusters can be
properly generated for low frequency keywords as well as high frequency
keywords, and computational costs are relatively low.

2. The automatic construction of faceted terminological feedback for
interactive document retrieval (Peter G. Anick)

In this paper the author shows a linguistic method for the automatic
construction of terminological feedback for use in interactive
information retrieval. The author underlines that query terms are
likely to match many irrelevant documents. This potential mismatch of
query terms and document terms is not the only problem facing the
online information seeker. The information need itself may be poorly
defined in the user's mind. In his opinion, of all the techniques for
generating terminological feedback for use in interactive query
refinement, faceted feedback schemes are unique in providing users not
only terminology but also an explicit framework for reasoning along the
multiple dimensions that characterize a domain. His approach to the
automatic generation of terminological feedback, like a faceted
classification, structures terminology along salient dimensions. The
approach is based on the observation that key domain concepts within
databases and result sets tend to participate in families of
semantically related lexical compounds. The result is a system which
dynamically generates a terminological feedback for query result sets.

3. Automatic term detection. A review of current systems (M. Teresa
Cabré Castellví, Rosa Estopà Bagot and Jordi Vivaldi Palatresi)

This paper is an useful survey of a number of recently developed term
extraction systems. All systems are analysed and compared against a set
of technically relevant characteristics. The aim of the paper is to
analyse the main systems of terminology extraction in order to describe
its current status and thus be able to enrich them. The paper is
divided up into two main parts: firstly, the largest part is devoted to
describe various systems of terminology extraction together with a
short evaluation in which weak and strong points have been outlined.
The systems under description are: ANA, CLARIT, Daille-94 (ACABIT),
FASTR, HEID, LEXTER, NAULLEAU, NEURAL, NODALIDA- 95, TERMIGHT, TERMINO,
TERMS. Secondly, the terminology extraction systems have been
classified according to some parameters. A contrastive analysis of
these systems is based on six relevant aspects when designing a new
detection system of terminological units: linguistic resources,
strategies of term delimitation, strategies of term filtering,
classification of recognised terms and obtained results.

4. Incremental extraction of domain-specific terms from online text
resources (Lee-Feng Chien and Chun-Liang Chen)

This paper presents an efficient approach which can classify online
text collections from the Internet dynamically and extract domain-
specific terms incrementally. The approach is based on a live
dictionary with online information systems on the Internet, in which
most of the domain-specific terms can be incrementally extracted and
adapted with changes in text collections. Such a live dictionary can
reflect up-to-date information and will be very helpful in providing
real-time information service. On the other hand, as unlimited number
of corpus resources are available over the Internet, the proposed
approach also attempts to find an automatic way to organize the text
collections which are growing daily. This approach is based on proper
integration of linguistic knowledge acquisition (Zernik 1991) and IR
technologies, and is an extension of a previous work (Chien 1997),
which was originally designed to extract Chinese terms with correct
lexical boundaries from a large but static text collection. Although
this work is mainly designed for Chinese and oriental language
applications, some of the developed techniques are believed to be
applicable to western languages.

5. Knowledge-based terminology management in medicine (James J. Cimino)

The author shows a specific knowledge base in the field of medicine and
discusses the addition of new terminology to the existing semantic
network. In medicine, computer systems are often integrated to
facilitate data sharing, but the terminologies they use are typically
not integrated. There is rarely any centralized repository of terms
used in various systems, and no widely accepted standards exist. The
author describes an exception, the Medical Entities Dictionary (MED),
at the New York Presbyterian Hospital (NYPH). The paper describes the
design and development of terminology and two case examples
demonstrating some of the advantages to this approach: addition of a
new terminology of laboratory terms and maintenance of an existing drug
terminology.

6. Searching for and identifying conceptual relationships via a corpus-
based approach to a Terminological Knowledge Base (CTKB). Methods and
results (Anna Condamines and Josette Rebeyrolle)

This paper shows how conceptual information on terms can be found in
corpora. The authors present different methods for extracting
information that can later be used to build terminological knowledge
bases based on corpora (as opposed to application-based terminological
knowledge bases). The results presented in this paper aim to show the
feasibility of constructing a corpus-based approach to a Terminological
Knowledge Base (CKTB). The authors show that it is possible, with
appropriate systems and linguistic interpretation, to model a text,
particularly the conceptual relationships contained in it. This method
is applied to a French corpus and the results are assessed from the
point of view of various applications.

7. Qualitative terminology extraction. Identifying relational
adjectives (Béatrice Daille)

This paper presents the identification in corpora of relational
adjectives in French. First, the author defines and gives some
linguistic properties of relational adjectives (AdjR). Then, she
presents the termer (term extractor) and the modifications that she
carried out in order to allow the identification of relational
adjectives in texts. Relational adjectives and compound nouns which
include a relational adjective are then quantified and their
informative status is evaluated thanks to a thesaurus of the domain.
The results corroborate the linguistic studies and their intuition
about the informative character of the relational adjectives. The
author conclude with a discussion of the interesting status of such
adjectives and compound nouns for terminology extraction and other
automatic terminology tasks.

8. General considerations on bilingual terminology extraction (Eric
Gaussier)

The paper shows general questions about bilingual terminology
extraction. The author presents a standard characterization of terms
based on morpho-syntactic patterns. Then, using French and English as
the language pair to illustrate his discussion, he shows how the
specifications of terms in two different languages impact the alignment
process. In the second part of the paper, the author reviews three
methods which could be well adapted to bilingual terminology alignment.
The paper shows that, in order to account for the differences that
exist between two languages, the alignment methods must be very
flexible.

9. Detection of synonymy links between terms. Experiment and results
(Thierry Hamon and Adeline Nazarenko)

This paper focuses on a specific semantic relationship: synonymy. The
authors evaluate a method for detecting synonymy links between
terminological units contained in a specialized corpus. This paper
reports new experiments which help to understand how this synonymy
detection approach is to be used. This method makes use of machine-
readable dictionaries (general-language dictionaries as well as
specialized dictionaries) to infer synonymy relationships among the
components of complex terms. Results show the complementarity and the
usefulness of the different sources.

10. Extracting useful terms from parenthetical expressions by combining
simple rules and statistical measures. A comparative evaluation of
bigram statistics (Toru Hisamitsu and Yoshiki Niwa)

Another method of enhancing term quality is to select specific zones in
texts, for example parenthetical expressions, from which terms can be
acquired in correlation with other terms. This article presents such
terms and provides a comprehensive study of various statistical
criteria used to filter out relevant terms. Parenthetical expressions
are pairs of character strings A and B related to each other by
parentheses as in A(B). These expressions contain a large number of
important terms, such as organisation names, company names, their
abbreviations, and are easily extracted by pattern matching. The
authors show a simple and accurate method for collecting unregistered
terms from parenthetical expressions which identified two types of
parenthetical expressions by using pattern matching, bigram statistics,
and entropy.

11. Software tools to support the construction of bilingual terminology
lexicons (David A. Hull)

This paper presents a case study on the problem of bilingual lexicon
extraction. The author evaluates an existing terminology alignment
system by comparing its performance to that of human experts working on
the same task, the construction of a bilingual lexicon from a corpus
provided by the European Court of Human Rights. Then, he describes an
automatic terminology alignment algorithm that can be used as a
valuable pre-processing step in the interactive process of lexicon
construction. Finally, he presents a quantitative comparison of
automatic and manual alignment strategies.

12. Determining semantic equivalence of terms in information retrieval.
An approach based on context distance and morphology (Hongyan Jing and
Evelyne Tzoukermann)

This paper presents an approach useful in Information Retrieval to
determine the semantic equivalence between terms in a query and terms
in a document. This approach is based on context distance and
morphology. Context distance is a measure used to assess the closeness
of word meanings. This context distance model compares the similarity
of the contexts where a word appears, using the local document
information and the global lexical co-occurrence information derived
from the entire set of documents to be retrieved.  This method
integrates this context distance model with morphological analysis so
that the two operations can enhance each other.

13. Term extraction using a similarity-based approach (Diana Maynard
and Sophia Ananiadou)

The authors integrate syntactic and semantic information to find, rank
and disambiguate terminological units. The paper describes a new
thesaurus-based similarity measure, which uses semantic information to
calculate the importance of different parts of the context in relation
to the term. The authors claim context is the "key to understanding a
term". This method relies on the hypothesis that terms tend to occur in
groups, rather than singly or randomly, in other words that "terms are
better indicators of other terms". Results show that making use of
semantic information is beneficial for both theoretical and practical
aspects of terminology.

14. Extracting knowledge-rich contexts for terminography. A conceptual
and methodological framework (Ingrid Meyer)

The author has developed a method designed to extract information on
terms from running text in order to assist terminographers in their
everyday work. They can focus on knowledge-rich contexts, i.e. contexts
that contain relevant  information on terminological units that is
signaled  with specific patterns. First, the paper defines the concept
of a knowledge-rich context (KRC) by providing an analysis of the two
main types of KRC. Then, the author describes a methodology for
developing extraction tools that is based on lexical, grammatical and
paralinguistic patterns. Finally, there is a discussion of the most
pressing research problems of the field.

15. Experimental evaluation of ranking and selection methods in term
extraction (Hiroshi Nakagawa)

The author proposes techniques for the ranking and the classification
of candidate terms that rely on structural and statistical properties.
He claims that the relationship between complex terms and the simple
terms they include must be analyzed ; according to the author, this is
essential in estimating candidate term importance. The paper compares
experimentally the performance of two term extraction methods: C-value
based method (Frantzi and Ananiadou 1996) and Imp based method
(Nakagawa 1997). The author did the experimental evaluation with
several Japanese technical manuals.

16. Corpus-based extension of a terminological semantic- lexicon (A.
Nazarenko, P. Zweigenbaum, B. Habert and J. Bouaud)

This paper proposes a method for adapting a terminological semantic
lexicon to meet the requirements of new domains and corpora. The tuning
method described explores the corpus and gathers words that are likely
to have similar meanings on the basis of their dependency relationships
in the corpus. The tagging procedure is tested and parameterised on a
rather small French corpus dealing with coronary diseases. This method
is systematically evaluated by creating and categorizing artificial
unknown words. The results show that our tagging procedure is a
valuable help to account for new words and new word uses in a
sublanguage.

17. Term extraction for automatic abstracting (Michael P. Oakes and
Chris. D. Paice)

The authors offer a template-based technique for term extraction that
instantiates semantic roles of contextual words during the extraction
process. The paper describes term extraction from full length journal
articles in the domain of crop husbandry for the purpose of producing
abstracts automatically. Initially, candidate terms are extracted which
occur in one of a number of fixed lexical environments. Candidate terms
which can be lexically validated receive an enhanced weight. The
grammar for lexical validation was derived from a training corpus of 50
journal articles. Selected terms may be used to generate a short
abstract which indicates the subject matter of the paper.

COMMENTS
The articles collected in this book cover an interesting topic not only
for specialists. They can enlighten the comprehension of the structure
of the lexicon: "I don't think there can be any corpora, however large,
that contain information about all of the areas of English lexicon and
grammar that I want to explore ... [but] every corpus I have had the
chance to examine, however small, has taught me facts I couldn't
imagine finding out any other way." (Fillmore 1992:35)

REFERENCES
Chien, L. F. (1997), PAT-Tree Based Keyword Extraction for Chinese
Information Retrieval, Proceedings of ACM SIGIR ’97,
Philadelphia, USA, 50-58

Fillmore, C. (1992), Corpus linguistics or Computer-aided armchair
linguistics. In Svartvik, J. (ed.), Directions in Corpus linguistics,
Mouton de Gruyter, Berlin, 35-60

Frantzi T. K. and Ananiadou, S. (1996), Extracting nested collocations.
In 16th Proceedings of 15th International Conference on Computational
Linguistics, 41-46

Nakagawa, H. (1997), Extraction of index words from manuals. In
Conference Proceedings of Computer-Assisted Information Searching on
Internet, 598-611

Zernik, U. (1991), Lexical Acquisition: Exploiting On-line Resources to
Build a Lexicon, Lawrence Erlbaum Associates, Publishers

ABOUT THE REVIEWER
Federica Da Milano is a Ph.D. student in Linguistics at the Department
of Linguistics, University of Pavia, Italy. Her research interests
include linguistic typology, spatial deixis, and negation.


---------------------------------------------------------------------------

If you buy this book please tell the publisher or author
that you saw it reviewed on the LINGUIST list.

---------------------------------------------------------------------------
LINGUIST List: Vol-12-3106



More information about the LINGUIST mailing list