14.3048, Review: Computational Ling: Gaustad (2003)

Sun Nov 9 06:14:48 UTC 2003

LINGUIST List:  Vol-14-3048. Sun Nov 9 2003. ISSN: 1068-4875.

Subject: 14.3048, Review: Computational Ling: Gaustad (2003)

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Ogasawara <naomi at linguistlist.org>
 ==========================================================================
What follows is a review or discussion note contributed to our Book
Discussion Forum.  We expect discussions to be informal and
interactive; and the author of the book discussed is cordially invited
to join in.

If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for review." Then contact
Simin Karimi at simin at linguistlist.org.

=================================Directory=================================

1)
Date:  Sat, 08 Nov 2003 23:04:21 +0000
From:  Vittoria Prencipe <vittoriaprencipe at hotmail.com>
Subject:  Computational linguistics in the Netherlands 2002

-------------------------------- Message 1 -------------------------------

Date:  Sat, 08 Nov 2003 23:04:21 +0000
From:  Vittoria Prencipe <vittoriaprencipe at hotmail.com>
Subject:  Computational linguistics in the Netherlands 2002

Gaustad, Tanja, ed. (2003) Computational Linguistics in the
Netherlands 2002: Selected Papers from the Thirteenth CLIN Meeting,
Rodopi, Language and Computers: Studies in Practical Linguistics 47.

Announced at http://linguistlist.org/issues/14/14-2102.html

Vittoria Prencipe, Università Cattolica ''Sacro Cuore'' di Milano,
unaffiliated scholar.

DESCRIPTION OF THE BOOK

This volume is the result of the 13th CLIN (Computational Linguistics
in the Netherlands) Meeting, in 2002.

The book opens with the intervention, in Dutch, of Hugo Brandt
Corstius, invited speaker; De desillusie van mijn leven of Remember
November is a discussion of the impossibility of Machine Translation.

Of the 18 papers delivered at conference only 10 are published in this
book and they are ordered alphabetically by author. The topics of
paper are very wide, because of the diversity of current research in
Computational Linguistics (CL).

In the first paper, Parallel Communicating Finite Transducer Systems,
Erzébet Csuhaj-Varjú, Carlos Martín-Vide, and Victor Mitrana
discuss a new approach for constructing a finite transducer. The aim
of the authors is to extend ''the concept of parallelism and
communication from the grammar systems area to systems of finite
transducer'' (p. 10). So the model they propose makes use of
cooperation and communication, in order to increase the computational
power of the components and to decrease the complexity of the
different tasks by distribution and parallelism. The exposition is
fluent and the results are good.  Parallel communicating finite
transducer systems, indeed, provide interesting elements for further
studies well outlined by the authors in their conclusions (pp. 20-21).

The second paper, Extending a Finite State Approach for Parsing Commas
in English to Dutch, by Sebastian van Delden and Fernando Gomez, is
focused to identify syntactic dissimilarities of comma usages between
English and Dutch using a comma-tagging system. ''This approach
combines a set of simple deterministic finite state automata and a
greedy learning algorithm to assign descriptive tags to the commas in
a sentence'' (p. 25) and it is a necessary component of any finite
state partial parser. Testing the system on several Dutch and English
corpora, shows that as in English a Dutch comma tagger plays a crucial
role in a language processing system, resolving crucial syntactic
issues.

The next contribution, Handling Disfluencies in Spontaneus Language
Models, by Jacques Dutchateau, Tom Laureys, Kris Demuynch, and Patrick
Wambacq treats the automatic recognition of spontaneous speech. This
is one of the main topics in speech research, and its practical
applications include voice operated telephone services, automatic
transcription of meetings, automatic closed captioning of TV
programmes, control of handheld devices, and so on. The paper is
organized as follows. First, the authors enumerate the obstacles to
the accuracy of spontaneous speech; next they describe some
experiments in spontaneous language modelling for automatic speech
recognition.  Then they outline the standard architecture of a large
vocabulary spontaneous speech recognition (pp. 41-42) and they explain
the problems of spontaneous language modelling and present their
research (pp. 42-45); finally they describe the experimental set up
and give results on a recognition task (pp. 46-48).

In the first experiment on manipulated context, disfluencies are
automatically removed: ''this turned out to be beneficial for
repetitions, while having a bad effect on contexts containing
hesitations'' (p. 48). In the second experiment on the manipulated and
non-manipulated prediction context, the result is disappointing: in
some cases, in fact, disfluencies are strongly correlated with lexical
choice. Then the authors suggest to combine their model with acoustic-
prosodic information and to lead the LM to a more accurate automatic
context selection.

In the following paper, Learning to Segment Speech with Self-
Organising Maps, James Hammerton presents a new approach, employing
self-organising map (SOM), to create an unsupervised connectionist
model of speech segmentation. ''The SOM was chosen because it is both
biologically plausible and is an unsupervised learner'' (p. 53). The
author, primarily describes how the standard SOM operate (pp. 53-55);
then he adapts the standard SOM to speech segmentation. The first
modification deals with the memory of the SOM. The standard SOM, in
fact has no memory; it can map individual inputs and cannot map
sequences. The modified SOM proceeds as usual at the start of a
sequence, but, when next inputs are offered, the value of the previous
input and the pattern representing next inputs are added together
until the end of the sequence. Then the author presents a series of
experiments (pp. 56-60) and discusses the results. The results are
encouraging: the SOM is sensitive to the phonotactic regularities in
utterances, and can become sensitive to phonotactics in child-
directed speech (as the experiments demonstrated), so it could be
applied to the problem of speech segmentation with good results. But
''the modelling of speech segmentation is a field in its
infancy''... (p. 55).

The next paper, How is Grammatical Gender Processed?, by Christer
Johansson, introduces ''a problem for computational models that
process language by learning and using generalizations'': the
existence of paradigmatic gaps (p. 65). The author analyses the case
of Swedish and Norwegian adjective paradigm; he presents a corpus
study and a reaction time experiment. ''The corpus study estimated how
exclusive the problematic context is. The reaction time experiment
shows that the problematic adjectives have significantly longer
decision times than congruent or non-congruent...'' (p. 66). Then the
experiment shows that the problematic adjectives are perceived
differently from ordinary discongruency and nonsense words. So a
''lazy learner is a more plausible model, as it first stores positive
exemplars, and later it may find out that there are non examples of a
specific combination of factors, some of which factors may have
emerged after exemplars are collected'' (p. 74).

In the next paper, BaseNP Chunking using ILP, Stasinos Konstantopoulos
discusses ''the application of Inductive Logic Programming (ILP) to
the task of BaseNP Chunking'' (pp. 77). The first part of the
contribution (pp. 77-80) is devote to the description of ILP: a
program that generates ''knowledge, a hypothesis, within the bounds of
a given theoretical framework and prior world knowledge, the
background knowledge'' (ibid.). The second part (pp. 80-83) examines
text chunking, ''a form of shallow parsing that amounts to identifying
non-recursive, non-overlapping constituents chunks in a sentence,
without assigning internal structure to the chunks'' (p. 80). Finally
(pp. 83-88) the author focuses on the experimental using of ILP to
construct a NPBase in Prolog, and reports the results of the
experiment (pp. 88-90).

The following contribution, A Dutch Chunker as a Basis for the
Extraction of Linguistic Knowledge, by Kristina Spranger and Ulrich
Heid, describes the functioning of a ''robust and efficient tool for
the extraction of linguistic information from large text corpora''
(p. 93).  The authors first take a definition of a chunk and describe
two grammars providing a deep syntactic analysis (pp. 94-99); then
they describe their model and the three level of Chunking Process:
1. the introduction of most of the lexical information and the
building of Chunks; 2. the performance of the main chunking; 3. the
check of structures and the building of syntactic hierarchies. Finally
they describe a real application of the Chunker (pp. 99-102), and
present the results (pp. 102-108).

The following paper, Morpho-Syntactic Agreement and Index Agreement in
Dutch NPs, by Frank Van Eynde, focuses on the existence of
morphosyntactic agreement and index agreement and on the relation
between them. Wechster and Zlatic (2000, p. 508) contend that ''the
index agreement does not apply to NP-internal elements such as
determiners and adjectives'', but Van Eynde argues that ''this claim
is too strong for Dutch'' (p. 112). First he analyses the Dutch NPs
and makes a distinction between marked and unmarked nominals proposed
in Allegranza (1998) and Van Eynde (2003); then he describes the use
of the type head-functor-phrase to model the combination of a noun
with its prenominal dependents (p. 113). Then he spells out the
details of NP-internal morphosyntactic agreement (pp. 114-121) and
discusses two instances of NP- internal index agreement: marked Nouns
(pp. 122-124) and Predeterminers (pp. 125-126). The conclusions of
this paper are twofold: 1. ''the combination of prenominals with
unmarked nominals is subject to morphosyntactic agreement in case,
declension, number and gender''; 2. ''the combination of prenominals
with marked nominals is not subject to morphosyntactic agreement, bu
to index agreement'' (p. 126).

In the next paper, Harvesting Dutch Trees: Syntactic Properties of
Spoken Dutch, Ton van der Wouden, Ineke Schuurman, Machteld Schouppe,
and Heleen Hoekstra treat the word order phenomena in Dutch. They use
in their research the Spoken Dutch Corpus (CGN), a major resource for
contemporary Spoken Dutch. The word order in Dutch is relatively free,
but in practice this is not really true. ''This paper seeks to
investigate in a quantitative way some of the peculiarities of Dutch
word order'' (p. 129). First the authors describe the corpus (pp. 129-
130) and introduce some of the tools to explore it (pp. 130-131). Then
they present the results of exploration of CGN about syntactic aspects
of Dutch (pp. 132-138), particularly the position of the subject and
the verb cluster. Naturally only the surface of the possibilities has
been scratched and the first results corroborate the assumption that
in the unmarked case subjects occupy the first position on main
clauses (p. 139).

In the last paper, Improving a Spelling Checker for Afrikaans, Menno
van Zaanen and Gerhard van Huyssteen describe the development of an
improved spelling checker for Afrikaans. First authors examine the
existing spelling checkers for Afrikaans and offer an evaluation of
them based on user- friendliness and performance (pp. 144-146). Since
the results are not very encouraging, they try to improve the model.
Then the authors describe the general architecture of existing
spelling checkers (pp.148-149) and the of improved one, consisting in
adding morphological information, an n-gram analysis and an error
lexicon (pp.  151-152). Finally they discuss the remaining problems
(p. 154).

EVALUTATION

The aim of this book clearing mirroring the diversity of current
research on CL and achieved. The single contributions are the result
of empirical application of different models, all theoretically
founded, so the intended audience for this volume IS a professionals
and advanced students. All the contributions are very interesting and
well organized but the writing is dense and technical.

ABOUT THE REVIEWER

Vittoria Prencipe, Ph.D., works as a postdoctoral researcher in the
field of Translation Studies at the Università Cattolica "Sacro
Cuore", Milan (Italy). Her current research deals with the application
of a Sense - Text model to the field of linguistic translation.

---------------------------------------------------------------------------

If you buy this book please tell the publisher or author
that you saw it reviewed on the LINGUIST list.

---------------------------------------------------------------------------
LINGUIST List: Vol-14-3048