11.285, Review: Kornai: Extended Finite State Models (Review 2)

Fri Feb 11 21:37:23 UTC 2000

LINGUIST List:  Vol-11-285. Fri Feb 11 2000. ISSN: 1068-4875.

Subject: 11.285, Review: Kornai: Extended Finite State Models (Review 2)

Moderators: Anthony Rodrigues Aristar: Wayne State U.<aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Associate Editors:  Martin Jacobsen <marty at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Scott Fults <scott at linguistlist.org>
		    Jody Huellmantel <jody at linguistlist.org>
		    Karen Milligan <karen at linguistlist.org>

Assistant Editors:  Lydia Grebenyova <lydia at linguistlist.org>
		    Naomi Ogasawara <naomi at linguistlist.org>
		    James Yuells <james at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Sudheendra Adiga <sudhi at linguistlist.org>
                      Qian Liao <qian at linguistlist.org>

Home Page:  http://linguistlist.org/

Editor for this issue: Andrew Carnie <carnie at linguistlist.org>
 ==========================================================================

What follows is another discussion note contributed to our Book Discussion
Forum.  We expect these discussions to be informal and interactive; and
the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for discussion."  (This means that
the publisher has sent us a review copy.)  Then contact Andrew Carnie at
     carnie at linguistlist.org

=================================Directory=================================

1)
Date:  Thu, 20 Jan 2000 19:58:40 +0000
From:  Geoff Williams <geoffw at talk21.com>
Subject:  Review of Kornai's Extended Finite-State Models

-------------------------------- Message 1 -------------------------------

Date:  Thu, 20 Jan 2000 19:58:40 +0000
From:  Geoff Williams <geoffw at talk21.com>
Subject:  Review of Kornai's Extended Finite-State Models

Review of Andras Kornai (ed), Extended Finite State Models of Language.
Cambridge, UK: CUP 1999. ACL Studies in Natural Language Processing
Series. Includes Cd-rom.

Geoff Williams, School of Oriental and African Studies, University of
London.

Synopsis
The book is a selection of seventeen papers, plus one commentary,
originally presented at a workshop organised by the author in Budapest
in August 1996 at the European Conference on Artificial Intelligence
(ECAI 96). The accompanying cd-rom includes further papers from the
workshop which the authors subsequently withdrew themselves, plus some
of the software discussed in the book, and additional toolkits and
databases which may be used for experimentation. The papers cover a very
wide range of topics in computational linguistics, from formal language
theory to corpus linguistics, with the use of finite-state methods being
the common thread. As the range of work covered is so vast, there will
be few readers sufficiently competent or interested in all of the areas
represented to take in the whole of this book, let alone critically
evaluate it. Below I summarise each of the papers and give brief
evaluations of a selection of them. As I did not attend the original
workshop, I can only review the book as a self-contained work, and
cannot comment on what has been left out.

The following acronyms will be used throughout:
FS:	Finite state	
FST/A:	finite-state transducer / automaton
NLP: 	Natural language processing
POS: 	Part of Speech

1) Kornai: Introduction
In his preface Kornai gives the basis for the workshop and a brief
historical overview of the use finite-state techniques in NLP, tracing
them back to the grep family of Unix text processing tools. His comment
that with the arrival of transformational grammar, "neither careful
attendance to linguistic detail nor husbanding of computational
resources held much appeal" reveals his view that there are theoretical
and not just practical issues at stake in the use of finite-state
techniques.

2) Joshi and Hopely: A parser from antiquity
This paper describes a parser originally developed at the University of
Pennsylvania in the late 1950's, making it almost certainly the earliest
example of finite-state techniques applied to natural language parsing.
Developed as part of the Transformations and Discourse Analysis Project
directed by Zellig Harris, the program's construction was influenced by
both the linguistic theory adopted and the limited computing resources
available at the time. Painstakingly reconstructed by the authors from
the original extensive documentation, the parser is found to display
many features of finite-state systems still used today, and its main
interest is essentially historical. A detailed description of the
parser's operation is provided, and code is included on the accompanying
Cd-rom.

3) Lauri Karttunen's commentary on this paper makes the point that many
of the innovations introduced were forgotten and later rediscovered.
Examples which occur in later papers are the use of multi-word tokens
and an output in the form of a relatively flat syntactic analysis,
similar to that of current "chunking" systems.

4) Watson: Implementing and using Finite Automata Toolkits
This chapter describes in detail the design of a toolkit for generating
FS automata from regular expressions. The toolkit is available
commercially, with a simpler freeware version supplied on the
accompanying Cd-rom. Implemented as a library of C++ classes, it is a
low-level toolkit clearly intended for the serious programmer. As Watson
readily acknowledges, the toolkit was designed primarily with compilers
(for programming languages) in mind but recent interest among
computational linguists has encouraged the "development of a much more
general interface"(p. 19). All known algorithms for constructing FSAs
are implemented, together with 5 more for minimizing deterministic
automata, the comprehensiveness of the package would seem to be beyond
doubt.
	 Detailed performance comparisons of the various algorithms are
provided which leads to simple guidelines for choosing the best
algorithms for creating the two major types of automaton. Given this
result though, it is not clear why the remaining, inferior, algorithms
would ever be used and hence why they are included in the distribution.
Although the technical content of this work is high-calibre, there is
little to interest the linguist in this chapter, and in parts I felt it
read a little too much like a sales pitch.

5) Vilares et al.: Finite-state morphology and formal verification
This paper describes a software environment for the generation of
non-deterministic finite-state taggers, with reference to a
morphological tagger for Spanish verbs. The complexity of Spanish verbal
morphology, and the size of the resulting tagger (a FS device) is such
that automatic verification is necessary for error-checking and
debugging the tagger, the authors argue. A method is described, dubbed
"verification by reduction", based on the AUTO tool (Madelaine et al,
1989), which displays a trace of the path taken through a reduced
version of the global automaton in tagging (ie parsing) an input form.
They extend the technique, again using AUTO, to the correction of
ill-formed input strings. As the authors admit since this is limited to
substitution errors, it represents only a first step towards general
error correction. Performance graphs of the system are also provided,
although no comparisons with alternatives are given, making objective
evaluation difficult for the uninitiated.
	This chapter suffers from a number of minor but basic typographical
errors and idiosyncratic forms of English (e.g "outprinted") which make
it a difficult read in places. The work is ongoing, but in the listed
outputs of the system it was never clear to me what had actually been
implemented by the authors and what was due just to features of the AUTO
system being used in a particular way.

6) Tateno et al.: Japanese lexical transducer based on stem-suffix style
forms.
The goal of this paper is to describe a novel implementation of a
Lexical Transducer (LT) for Japanese based on principles of two-level
morphology (Karttunen 1983). Following later proposals of Karttunen, the
standard division of the data into a lexicon and a set of two-level
rules is modified so that idiosyncratic morphological alternations are
moved into the lexicon to reduce the complexity of the rule component.
This idea is pushed further by having stems and suffixes stored in two
separate lexicons, so that the various reflexes of an inflectional
suffix are not described by rules but simply in the suffix lexicon. An
extra set of transducers with simple structures is provided for
capturing the relation between canonical citation forms and stem-suffix
style forms. The overall size of the LT is claimed to be much smaller
than if the standard method had been used. The number of stems covered
by the resulting "full-size" transducer is not given.
	The implementation is supported by examples of the complexity of
Japanese morphological and phonological alternations, and examples of
some of these are given in two-level rule notation. The practical
advantage seems irrefutable, but what is perhaps lacking is some
linguistic motivation for the separate lexicons, and for the extra level
of transducers, a point which is mentioned in passing in the conclusion,
but unfortunately not discussed.

7) Kim and Jang: Acquiring rules for reducing morphological ambiguity
from POS tagged corpus in Korean.
This paper proposes a FSA-based solution to the problem of morphological
ambiguity in Korean words, or ‘morphological overanalysis'. This problem
results from the use of simple morphotactic tables in taggers for
Korean, binary relations which encode the ability of morphemes of two
different lexical categories, or parts of speech (POS), to occur
adjacent to one another in the word. Overanalysis arises because these
tables do not encode the large amount of lexical and syntactic
information that would be required for a completely accurate analysis.
	The proposed solution involves defining subsumption relations (SRs)
based on semantic interpretations of compounds: the more specific, less
ambiguous, analysis / interpretation is said to be subsumed by the more
general one. This relation is defined formally and determined by a set
of subsumption conditions (SCs), which are realised as FSAs, of which
there is one for every POS. An algorithm is given for the automatic
determination of the SCs from a large POS-tagged corpus of Korean word
data. Results for ambiguity reduction fall short of those with
hand-crafted grammars reported by the same authors.

8) Chanod and Tapanainen: Finite-state reductionist parsing for French
The authors propose a shallow syntactic parser for French, consisting of
a non-deterministic tokeniser and a multi-word lexicon (MW) plus a set
of finite-state syntactic constraints. The MW contains numeral strings,
adverbial phrases of time, and the like, but otherwise recognises the
same set of morphological tags as the tokenizer. Outputs are in the form
of standard tags augmented by syntactic relations of the words: e.g.
PPObj for prepositional complement, NounPrMod for noun pre-modifier.
Syntactic rules are encoded in terms of extended regular expressions
involving essentially relational information: what can appear next to
what. The authors try to show how certain non-trivial linguistic
phenomena - argument uniqueness, agreement and apposition - can be
partially captured by interaction of the various rules, in a fashion
reminiscent of the modularity of standard syntactic theory. How much
this contributes to the overall accuracy of the parser is unfortunately
not discussed. Test results on 150 sentences from technical manuals give
95% correct parses, with an average number of parses per sentence of
1.9. Again no comparisons with other systems are provided.

9) Grefenstette: Light parsing as finite-state filtering.
Light parsing, defined as the extraction of pertinent information from
text without performing a full syntactic analysis, is applicable to
tasks such as information retrieval and lexicography. The basic idea is
that sufficient syntactic information can be retrieved with
computationally efficient finite-state methods. Although some previous
work has included components beyond the finite-state, Grefenstette shows
how an entire system can be built using only FS techniques. He makes use
of Karttunen's (1996) regular expression compiler to create efficient FS
filters which, when applied sequentially, mark the relevant syntactic
groups and relations in the text. This is exemplified for two cases,
namely head nouns of noun phrases, and subjects of active verbs.
	System performance is impressive in terms of operational speed and
error rates of the subject filter mentioned (76% for 50 randomly chosen
sentences out of 8000). A detailed error analysis is provided of the
incorrect test sentences, which fall into 4 distinct categories.

10) Kornai: Vectorized finite-state automata (VFSAs).
Kornai elaborates a new type of FS automaton in which the state space is
based on vectors (arrays), and describes its application in NewsMonitor,
a system for extracting "relational information" from financial news
feeds. Examples of relevant relations are ‘Person, Company, Title'
(PCT), and ‘Who bought what?'.Such a task must be accomplished very fast
if it is to be useful to financial decision makers.
	The basic idea of VFSAs is to decompose the state space using
appropriate features or flags, which is suggested by the notational
compactness provided by feature systems in linguistics, and particularly
phonology. A formal definition is postponed until Section 4, after a
discussion of the performance of NewsMonitor in comparison to its
predecessor, RelationalText. By taking only a lazy approach to parsing
and searching for appropriate relations both across or within sentences,
the new system achieved a staggering 97% precision on the PCT task using
3 minutes of processing time per issue of the Wall Street Journal
(platform not specified). A direct comparison is not possible since the
earlier system obtained deep parses of every sentence, taking 36 hours
per issue on a high-end Symbolics, so unfortunately it is not really
clear how much the performance gain is due to the VFSA technique itself.
	Kornai concludes with a discussion of the theoretical implications of
his device and its relation to register vector grammars. A strong
argument is made that although the device displays properties of a
linear bounded automaton, in terms of the time and space complexity of
its operation - demonstrated by its real-time operation in the practical
system - it should be considered a finite-state device.
	A very compact paper that could easily have been extended to twice its
length, this is a challenging read partly due also to its highly
mathematical content in places. Kornai is in a minority of those in
applied work to take issues of linguistic and psychological plausibility
seriously, rather than merely supplying the simplistic answer, as he
puts it: "Hey, it works" (p. 107). But the defence of the psychological
plausibility in the final two paragraphs does not do justice to the
issues (and probably was not intended to). The main point of theoretical
interest for me lies in the justification of the finite-stateness of
VFSAs, where part of the argument implies the subordination of the
Chomsky hierarchy to complexity results of a practical machine
implementation as an evaluation criterion. This would seem to be an
admission that the classification of grammars as finite-state or
otherwise is less important computationally than the time and space
complexity of the resulting grammar (which was shown earlier by Barton,
Berwick & Ristad 1987).

11) Roche: Parsing free and frozen sentences
This paper shows how an FST-based parser can cope with syntactic
phenomena requiring a transformational analysis in terms of deletion in
some versions of generative theory (Harris 1991). Some problems with
deletion in context-free grammars are explained to justify the approach,
and detailed parses of relevant sentences are very clearly presented.
The framework is extended to support-verb (or ‘light' verb)
constructions and so-called frozen sentences, involving idioms such as
"kicks the bucket". So far the framework has been applied to a
large-scale grammar of French and a smaller grammar for English.
	The exposition is mostly very clear and well supported by the
transducer diagrams: it could probably be followed without too much
difficulty by non-specialist readers. The fact that certain types of
transformation can be modelled by FS techniques is potentially of
theoretical interest, but the impact is lessened as current models shift
away from transformations and more on feature checking and the
interaction of constraints. The system would appear to be a competitor
to the principle-based parsers of Berwick et al(1991) and related work,
based on Chomsky's principles-and-parameters model. It is not clear to
me how, or whether, unbounded dependencies in wh-movement and the like
could be accommodated in Roche's model, nor how easily the techniques
transfer to other languages, a demonstrable advantage of the
principle-based approach.

12) Vilar et al.: Text and speech translation by means of subsequential
transducers
The authors introduce a finite-state transducer model for language
translation, known as a subsequential transducer (SST), and show how it
can be integrated with an HMM-based continuous speech recognition system
to perform speech-based translation between Spanish and English.
Intended target applications are limited domain tasks with restricted
syntax. A novel algorithm is presented for automatic learning of SSTs
from corpus data containing input-output samples. SSTs handle word order
differences simplistically by delaying output words until appropriate
completion of the input phrase, which introduces additional states into
the transducer at a rate exponential in the length of the delay. Using a
technique for automatically reordering the output sentences of the
training corpus, and marking up the changes using brackets, considerable
improvement in word error rates and the size of the resulting
transducers is obtained. Integration with speech recognition is
described and test results for Spanish-English translation from speech
input are presented. The advantages of the new training method are
evident in providing more accurate syntactic language models for the
speech recogniser.
	As the authors accept, the main bottleneck in the SST training
procedure is the availability of sufficiently large corpora of paired
sentences, but they believe this will be remedied in the near future.

13) Ejerhed: Finite state segmentation of discourse into clauses
The first part of this paper argues from statistical analysis of an
existing Swedish speech corpus that clause boundaries are better
predictors of discourse events, such as pauses and information unit
boundaries, than are phrase boundaries. In the second part a clause
segmentation algorithm based on regular expression rules for clauses
(provided in an appendix) is described, and results of tests on Swedish
unrestricted text are presented. The basis of the algorithm is related
to a formal language theoretic result of Salomaa on k-testable
languages, a subclass of the regular languages. Results with scanning
width k = 4are around 95% accurate, and nearly all errors were due to
cases which were not covered by the rules, or to tagging errors in the
test corpora.
	Although the statistical argument implies a strong argument in favour
of using clauses rather than phrases as natural language processing
units, no explicit statement of this was made, which made the paper seem
to consist of two almost unrelated halves. Some justification of the
definition of clause adopted would have been helpful to avoid the
conclusion that it may have been ad hoc. As the final point of the
conclusion states, the clause segmentation rules are highly
language-specific and must be written for each language, and tested on
large amounts of empirical data. One problem with this is that in order
to get meaningful results, the definition of clause used by the
segmentation algorithm must be the same as that used in tagging the
corpus, hence experiments using different definitions would be very
laborious.

14) Schulz and Mikolaejewski: Between finite state and Prolog:
constraint-based automata for efficient recognition of phrases
A new type of automaton is described for efficient recognition of
phrases from large corpora, in this case for simple German noun phrases
which involve significant morphological agreement phenomena. These
"constraint-based automata" combine the efficiency of finite-state
techniques with the compactness of grammatical rule statements which is
available in Prolog. The recognition procedure is a two-phase one in
which dictionary lookup and matching are performed exclusively by the FS
component, while unification and constraint solving are performed by the
Prolog-like component. The authors have developed a programming language
for implementing the automata, which takes as input simple Prolog-style
transition rules for describing the agreement system. Preliminary
results on various types of German noun phrase show that considerable
speed advantage is gained over a purely constraint-based approach
implemented in the ECLiPSe language. The concept underlying the
automaton is described quite clearly without too much formal detail. In
general, conceptual and implementational issues are clearly
distinguished, and the conclusions include some suggestions for further
improvement in both areas.

15) Banglore: Explanation-based learning and finite-state transducers:
applications to parsing lexicalised tree-adjoining grammars.
Demonstrates how a hand-crafted wide-coverage grammar (in the LTAG
formalism) can be made portable to specific domains using
explanation-based learning (EBL) techniques. Parsing of LTAGS in a
limited domain is seen as a finite-state transduction from strings to
their associated dependency structures, and it is this relation to which
the EBL technique is applied. An extension to the system in the form of
a lightweight parser, termed a stapler, is also discussed. The work is
carried out using XTAG, a grammar development system freely available
from http://www/cis.upenn.edu/~xtag. Experimental results on the ATIS
corpus are provided.
	This is the longest paper of the volume and very detailed, but clearly
written and laid out. Enough detail is given both of the formalism and
the EBL technique for those not specialised in the area (such as myself)
to follow, although it is hard going in places.

16) Tzoukermann and Radev: Use of weighted finite-state transducers in
POS tagging.
Presents an approach to POS-tagging using statistical FSTs, a method of
combining statistical and linguistic information previously applied in
automatic speech recognition. In this technique, transitions of an FST
are weighted according to their probability of being correct in a given
context. Weighted FSTs are applied both to morphological analysis and
the n-gram language model for word sequences. Simple negative
constraints to rule out impossible sequences are also used in the
system. Performance figures are given, and the authors relate their
research to previous work,  explaining why comparison is difficult.

17) Csuhaj-Varju: Colonies: a multi-agent approach to language
generation.
Describes the application of colonies, systems of cooperating
multi-agent symbol sets, to natural language genertation. Each of the
autonomous agents is a regular grammar, and the behaviour of the system
is described by the language generated by the jointly operating
component grammars. Since the resulting language can have a generative
capacity which is non-regular, the interest of this approach lies in the
possibility that certain complex context-sensitive behaviour found in
natural language, such as multiple or crossed agreement, may be
described more succinctly by colonies of simpler grammars acting
cooperatively. Unfortunately most of the article consists of formal
language results and the question is not adequately addressed.

18) Nederhof and Bertsch: An innovative finite-state concept for
recognition and parsing of context-free languages
The second paper concerned with formal language theory, this chapter
shows that all languages in the regular closure of the deterministic
languages can be recognised in linear time. This result is important and
relevant to NLP since the resulting class of "meta-deterministic"
languages includes many inherently ambiguous languages, and it has been
claimed previously that such languages cannot in general be processed in
linear time. Extension of the recognition result to parsing is
algorithms is also discussed, and parallels are drawn with the work of
Abney (1996) on multi-level finite-state parsers.

19) Ristad: Hidden Markov Models with finite state supervision
This paper describes a supervised training paradigm for HMMs, and gives
algorithms for estimating the parameters of the supervised HMM. The
proposal makes use of a modified HMM with distinguished initial and
final states. This relaxes the requirement of independence between
adjacent observations inherent in the standard model, and allows
training of models without the need for aligned transcriptions, which
should give practical advantages in both speech and handwriting
recognition. No implementation of the approach is discussed.

Overall Evaluation
The book, together with the additional material on the cd, provides a
comprehensive overview of the state of the art in the use of finite
state techniques in NLP. On the whole the editing is good: the papers
are grouped together appropriately but there are irritating
discrepancies of formatting and a few missing references, and section
numbers, which indicate much of the responsibility was left to the
individual authors. The general level of the book is advanced and
certainly not for the computationally faint-hearted, but this is just
the nature of the field. A few of the papers introduce basic notation
and concepts indicating they might have been aiming at a more general
linguistics audience as well.
	In terms of overall content the main criticism I have is whether such a
diverse set of works can be coherently grouped together by the fact that
they employ finite-state methods. In corpus -based approaches, including
speech recognition, there are simply no computational tools available
other than finite-state ones, hence there is no theoretical commitment
involved in using them. In fact because of the assumptions on which they
are based they can be unsuitable for non-mainstream approaches (eg using
sub-segmental units in speech recognition with HMM-based toolkits). In
some cases the finite-state component is but one of several; sometimes
it is there for reasons of implementational efficiency, and sometimes
only a partial or approximate solution is obtainable, eg by limiting the
domain of interest. Evaluated on their merits as practical solutions to
problems though, there is little to argue with as long as comparisons to
other competing work are given.
	Such work is fundamentally different however from the kind that
advocates a finite-state analysis of some natural language phenomenon.
In his introduction Kornai argues that some of the papers (8, 9, 10 &
11) describe phenomena, such as light verbs, which were "in the
tradition of Chomsky (1970) treated as core cases of transformational
grammar" (p. 4). The fact that they no longer are undermines this
argument considerably. Also, because the models are evaluated according
to performance on a specific task, and issues of psychological
plausibility are generally neglected, it is difficult if not impossible
to compare them with transformational approaches. Nor can their
undoubted success in limited practical but artificial tasks be evidence
for their correctness as models of human language. Hence I think the
impact of FS techniques on mainstream syntax has been somewhat
exaggerated by Kornai.

References
Abney (1996). Partial parsing via finite-state cascades. In J. Carroll
(ed.), Workshop on Robust Parsing, pp. 8-15.
Barton, G., R. Berwick & E. Ristad (1987). Computational complexity and
natural language. Cambridge, MA: MIT Press.
Berwick, R., Abney, S. & Tenny, C. (eds). (1991). Principle-based
parsing: computation and psycholinguistics. Dordrecht: Kluwer.
Chomsky, N. (1970). Remarks on nominalization. In Jacobs and Rosenbaum
(eds.), Readings in English transformational grammar, 184-221.
Blaisdell, Waltham MA.
Harris, Z. (1991). Theory of Language and Information. Oxford University
Press.
Karttunen, L. 1983. Kimmo: A general morphological processor. In Texas
Linguistic Forum, 22, 165-186.
Karttunen, L. (1996). Directed replacement. In Proceedings of the 34th
Annual meeting of the ACL, Santa Cruz, CA.
Madelaine, E. and Vergamini, D. 1989. AUTO: A verification tool for
distributed systems using reduction of finite automata networks. In
Proceedings of the FORTE ‘89 Conference, Vancouver.

Reviewer:
Geoff Williams has a bachelor's degree in engineering, and a PhD in
linguistics from the School of Oriental Studies. His research interests
are in phonology, speech and language processing and computational
aspects of language.
-
______________________________________________________________
Geoff Williams			
13 Park Avenue
Chelmsford, Essex CM12AB
UK
Tel & Fax: +44-(01245) 261688

---------------------------------------------------------------------------

If you buy this book please tell the publisher or author
that you saw it reviewed on the LINGUIST list.

---------------------------------------------------------------------------
LINGUIST List: Vol-11-285