16.2330, Review: Computational Ling/Translation: Halliday (2004)

LINGUIST List linguist at linguistlist.org
Fri Aug 5 08:25:23 UTC 2005


LINGUIST List: Vol-16-2330. Fri Aug 05 2005. ISSN: 1068 - 4875.

Subject: 16.2330, Review: Computational Ling/Translation: Halliday (2004)

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Ogasawara <naomi at linguistlist.org>
================================================================  

What follows is a review or discussion note contributed to our 
Book Discussion Forum. We expect discussions to be informal and 
interactive; and the author of the book discussed is cordially 
invited to join in. If you are interested in leading a book 
discussion, look for books announced on LINGUIST as "available 
for review." Then contact Sheila Dooley at collberg at linguistlist.org. 

===========================Directory==============================  

1)
Date: 04-Aug-2005
From: Veena Dixit < veena at cse.iitb.ac.in >
Subject: Computational and Quantitative Studies 

	
-------------------------Message 1 ---------------------------------- 
Date: Fri, 05 Aug 2005 04:21:03
From: Veena Dixit < veena at cse.iitb.ac.in >
Subject: Computational and Quantitative Studies 
 

AUTHOR: Halliday, M. A. K. 
EDITOR: Webster, Jonathan J. 
TITLE: Computational and Quantitative Studies 
SERIES: Collected Works of M. A. K. Halliday 
PUBLISHER: Continuum International Publishing Group Ltd 
YEAR: 2004
Announced at http://linguistlist.org/issues/15/15-3580.html


Veena Dixit, Center for Indian Language Technology, 
Indian Institute of Technology, Bombay, India.

This is the sixth volume from the collected works of Professor M. A. K. 
Halliday that runs into ten volumes. Professor Halliday has had a lifelong 
engagement with language and these volumes represent the outcome. The book 
portraits the developmental phases of machine translation (MT) from the 
perspective of Firthian frame of lexical-functional grammar. Computer 
technologies have developed considerably since the date the first article 
of the volume appeared. Nevertheless, the early articles continue to be 
relevant and not only from a historical point of view. 

SYNOPSIS

The book contains eleven articles divided into three parts. Each part has 
a  brief introduction by the Editor. There is an appendix containing a 
trial grammar for a text generation project. The selection of articles 
represents the sequential shift in the focus of the author's interest 
while stressing the continuity and development of themes articulated in 
the 1950s. 

The central theme of the first part is that the linguistic analysis 
secured on sound and scientific theory is the prerequisite of any language 
oriented mechanical task. Such analysis offers language description in 
mutually, unilaterally approximating comparative terms. The author 
proposes that the description of languages, source language (SL) and 
target language (TL), should cover levels of grammar and lexis at one end 
and context at the other end. The description can be in the form of 
statistical statements displaying quantitative analysis of occurrences of 
items. The rules for the systematic relating of these two descriptions 
should be appended to the descriptions. The expected relationship between 
items is in terms of translation equivalence. 

The second part contains six chapters, which continue and develop the 
central propositions of the first part of the book. The linguistic system 
is inherently probabilistic in nature. Grammatics, the theory of grammar, 
has to be paradigmatic. Quantitative analysis will throw light on 
probability of choosing. The basis for quantitative analysis of language 
is the principle that the frequency in a text instantiates probability in 
a system.

Corpus linguistics is as much about theory building as it is about data 
collecting. Corpus provides methodological means for collecting evidence 
of relative frequencies in the grammar, from which the probability 
profiles of grammatical systems can be established. This is the theme for 
the third part.

EVALUATION BY CHAPTER 

Chapter one: 'The Linguistic Basis of a Mechanical Thesaurus' (1956): The 
fact that grammar and lexis exhibit high degree of internal determination 
is exploited. Machine translation is defined as a function between two 
given languages. Translation procedure involves translation equivalence, 
equivalence of determining features and operation of particular 
determining features in TL. 

Autonomous analysis and construction of a mechanical thesaurus are needed 
for MT. Grammar should be viewed as a statistics based statement of 
lexical redundancy, which can be handled autonomously by Lattice program. 
Thesaurus is defined as the lexical analogue of a grammatical paradigm, in 
which words are arranged in a contextually determined series to achieve 
translation as well as contextual equivalence. One can abstract the 
collocation and the non-collocation features of context from the language 
text.

The proposition is substantiated with examples from Chinese and English.

Chapter two: 'Linguistics and Machine Translation' (1962): This article 
foreshadows the themes developed by the author over the next thirty years. 
There is no analogy between code and message on the one hand and form and 
content on the other. A full description of a language involves categories 
and methods, which are peculiar to that language. These categories need to 
be used for stating the patterns of language and for showing how it works. 
The author introduces necessary technical categories such as unit, form, 
rank, and level. Description is complete when independent grammatical 
description and lexical description is shown to be related. 

The author expresses the necessity of quantitative analysis for the 
description of the languages. Computer has to translate on more likely or 
less likely basis than yes-no basis.

He concludes that the Interlingua for translation between pairs or groups 
of the languages concerned can be neither natural language nor machine 
language. It will have to be a mathematical construct serving as transit 
code between natural languages. 

Chapter three: 'Towards Probabilistic Interpretations' (1991): Professor 
Halliday starts from a rather distant point by posing the question how 
change is to be incorporated into the structural linguistic concept of a 
system. Language may have infinite possibilities but it has a finite 
number of users. A probabilistic model of lexicogrammar enables us to 
explain register variations, which relates with diachronic variations. 
When probability achieves a certainty, it is a category change. Every 
single instance alters the probability of the system in some measure. 

The difference between physical systems or biological systems and semiotic 
systems lies in the key concepts of instantiation and realization. In a 
semiotic system, instances have differential qualitative values (referred 
as Helmet Factor). As to realization, linguistic systems are characterized 
by stratification. The author wants to escape from constructivist trap.

Chapter four: 'Corpus Studies and Probabilistic Grammar' (1991): The 
chapter is about the theoretical status of corpus frequencies. The author 
refutes Chomsky's theory of competence and performance, as by definition 
it made impossible that analysis of an actual text could play any part in 
explaining grammar of the language. He points out that the corpus studies 
are a well-established source of information about the grammar of 
language. A statement about quantitative patterns of grammar is not an 
attack on the freedom of choice of an individual while using the language. 

Probabilities do not predict single instances; rather they predict the 
general pattern. The significance of probabilities lies in interpretation 
than prediction of the single instance. It is evident that even children 
construe the lexicogrammar, on the evidence of text frequency, as a 
probabilistic system. 

Consistent with his views on the role of linguistics, Professor Halliday 
holds that lexis and grammar are complementary perspectives and not 
contrastive, opposing or unrelated fields. Each explains different aspects 
of a single phenomenon.

Chapter five: 'Language as System and Language as Instance: The Corpus as 
a Theoretical Construct' (1992): System and instance are two end observers 
of a single phenomenon, the language. Every instance of a text perturbs 
the overall probabilities of the system. The more we observe instances, 
the better we perform as system observer. Professor Halliday emphasizes 
that the corpus need to have very large sample of real text. 

We can check the relative frequencies and the frequencies broken down by 
the register to test the hypothesis regarding probability typology. We 
need to measure how the probability of selecting one term is affected by 
previous selections made within the same system. It is possible to measure 
the complexity of the language through general measures such as lexical 
density or specific measures such as length of nominal chains. The degree 
of association between simultaneous systems can be found. Measure of 
conditional probabilities can give insights into historical linguistics. 

The chapter discusses the aspects of statistical measures of natural 
language.

Chapter six: 'A Quantitative System of Polarity and Primary Tense in the 
English Finite Clause' (1993): This chapter is co-authored with Z.L. 
James. The intention was to undertake basic quantitative research in the 
grammar of modern English. The authors decided to access the corpus 
directly using existing programs. They hoped to test the hypothesis that 
grammatical systems fall largely into two types. There are systems where 
the options are equally probable; there is no unmarked term in the 
quantitative sense. In the other type of systems the options were skew, 
one term being unmarked. The authors then detail the procedure adopted, 
the problems faced and the important decisions taken during the course of 
the study.

Chapter seven: 'Quantitative Studies and Probabilities in Grammar' (1993): 
According to Professor Halliday, corpus linguistics modifies our thinking 
about theoretical linguistics. He maintains that because of quantitative 
studies, some interesting patterns seemed to emerge. Any concern with 
grammatical probabilities makes sense only in the context of a 
paradigmatic model of grammar. 

Systemic functional corpus studies investigate systemic variation in 
patterns of meaning on the plane of content rather than plane of 
expressions. The studies investigate the internal relationship between two 
systems within the grammar in terms of their interdependencies and their 
logical semantic relationship.

In the second half of the chapter, the author discusses the factors, which 
identify the grammatical systems for investigation and the decisions taken 
during the study. There are procedures adopted and statements of 
observations made during the studies, as also the analysis of inaccuracy 
and the steps taken to deal with errors and omissions. He holds that the 
analysis should be valid when applied to any natural text.

Chapter eight: 'The Spoken Language Corpus: A Foundation for Grammatical 
Theory' (2002): The author holds that only in spoken language, the full 
semantic potential of the system is brought into play, from which flow new 
insights to the theory of language in total. 

The metaphor, 'reducing spoken language to writing' suggests that some 
features such as melody and rhythm are lost in transcribing the spoken 
variety. Transcription should be faithful to the essential natural 
features of the spoken variety, which are functional in carrying meaning. 

With some reservations, the author accepts the distinction between 'corpus-
based' and 'corpus-driven' descriptions, both essentially need to be 
theory based. He describes structure as theory of syntagm and system as 
theory of paradigm. 

He concludes that grammatical probabilities, both global and local, are an 
essential aspect of 'what language really is and how it works'. The 
discussion is supported by a few interesting examples and the results of 
spoken corpus studies.

Chapter nine: 'On Language in Relation to Fuzzy Logic and Intelligent 
Computing' (1995): The author expresses need for systemic analysis of the 
language for MT rather than depending on commonsense knowledge about the 
language. After detailing the distinct features of language as semiotic 
system, he summarizes the complexity of language. The complexity arises as 
the systems are not fully independent, and relate to one another. Nor do 
they form any kind of strict taxonomy. There are various degrees and kinds 
of partial association among the systems. Thus, there is a great deal of 
indeterminacy, both in systems and in their relationship. The overall 
picture is notably fuzzy. It is essential to account for fuzziness of 
language, its disorder and complexity, not as accidental and aberrant, but 
as systemic and necessary to convey the meaning. 

Finally, he outlines the basic principles adopted in attempting to 
theorize about language. He wants to formulate grammar paradigmatically, 
contextually, functionally and fuzzily. Examples are used to illustrate 
the principles of systemic modeling.

Chapter ten: 'Fuzzy Grammatics: A Systemic Functional Approach to 
Fuzziness in Natural Language' (1995): This chapter is about the role of 
grammar when natural language is to be used as a metalanguage for 
intelligent computing. The basic metafunctions of natural language are 
ideational, interpersonal and textual. Ideational metafunctions construe 
experience, which can be material, mental, verbal or relational. 
Interpersonal metafunctions enact social relationship and creates 
discourse. Metafunctions are comprehensive, extravagant, telescopic, non-
autonomous, variable and indeterminate. Rhetorical toning, indistinctness, 
unexpectedness, logogenesis, complexity, irrelevance, jocularity and error 
are some of the problem areas of natural language as metalanguage. 

The author expresses the need to model language reality in terms of 
tendencies rather than in terms of categories. This makes it possible for 
natural language to be its own metalanguage.

Chapter eleven: 'Computing Meanings: Some Reflections on Past Experience 
and Present Prospects' (1995): MT began in 1950s with the premise that the 
approach had to be mathematical and logical. It was only in the mid 1960 
that the phenomenon of language came to be seen to be autonomous. In the 
1980s, language came to occupy the central stage and computers became a 
tool for linguistic research. Now research is at a stage where we can 
think of computers functioning through the medium of natural language. It 
was recognized that a word has its meaning only in the total meaning 
potential of the language. 

For intelligent computing to succeed, we will have to align language and 
knowledge on the one hand and instance and the system of which it is an 
instance on the other. Professor Halliday then summarizes those points of 
linguistic complexity that will have to be taken into account if computing 
with natural language is to succeed. 

When computing will involve operating with natural languages, we will 
finally be computing meaning. 

CRITIQUE

A general theme runs through the book. Language is described as made up of 
choices of alternative patterns. It is therefore inherently probabilistic. 
Different aspects of the same issues are discussed in appropriate contexts 
over different chapters. Many times the author draws on some probability-
based results to support his hypothesis. 

The theoretical statements regarding sentence equivalence are not 
supported by adequate discussion in chapter two.

In chapter three, the author supports his propositions by discussing 
fieldwork for child language acquisition as well as cognitive processes 
regarding language. This makes the propositions more meaningful.

It is stated that cause and effect in case of physical systems are 
directional. However, the author has not considered whether this holds for 
human perception also. 

Firthian concept of 'system' in chapter four provides the necessary 
paradigmatic base for corpus based probabilistic studies of the language.

In chapter six, the conclusions are tabulated. These conclusions are not 
always short and sharp answers. 

I personally disagree with the following statement in chapter 
nine, "Literate, educated adults no longer have access to commonsense 
knowledge about language; what they bring to language are the ideas they 
learnt in primary school, which have neither unconscious insights of 
everyday practical experience nor the theoretical power of designed 
systematic knowledge" (p. 197). It appears to me that a person can improve 
an acquired language by constant access to contemporary knowledge about 
language. The capability for second language learning may support this 
view.

There is no justification for excluding ungrammaticality from formal model 
of language in chapter ten. It is generally accepted that a linguistic 
description to be complete has to account for ungrammaticality.

It should make us pause and think that school education is sufficient for 
day-to-day language use but not adequate for MT. Is this inadequacy merely 
the difference between the use and the explanation for the use?

Can we relate difference between patterns of spoken and written version of 
the language to the gesture and facial expressions and body language, 
which are concomitant with spoken language?

Perhaps corpus linguistics can be usefully supplemented by a study of 
forms of non-verbal communication.

CONCLUSION 

This unusual book displays Professor Halliday's different concerns and 
endeavor to give linguistics, particularly, probabilistic corpus studies, 
a central role in MT. While illuminating the developments, he provides 
insights and linkages with different contemporary subjects. 

On reading the book, the reader cannot but feel that it is only on the 
development of a comprehensive theory of meaning that computational 
linguistics can finally come into its own. 

REFERENCES

Chomsky, Noam (2004): New Horizons in the Study of Language and Mind, 
Cambridge University Press

Dash, Niladri (2004): Corpus Linguistics and Language Technology, Mittal 
Publications, New Delhi 

ABOUT THE REVIEWER

The reviewer is M. A. (Linguistics) and pursuing her Ph. D. in 'Word Sense 
Disambiguation'. She is engaged in research on the less-studied and 
resource-poor language, Marathi, the state language of Maharashtra State 
of India. She is a significant contributor to the development of 
Morphology Rule-Based Spellchecker for Marathi. At present, she is working 
on a Rule-Based Part-of-Speech Tagger for Marathi. She is participating in 
the development of Wordnet for Marathi. She has undertaken to design a 
course for learning Marathi as a second language. Her lectures on 
morphology are available on the net. She has presented her work in 
national and international conferences.





-----------------------------------------------------------
LINGUIST List: Vol-16-2330	

	



More information about the LINGUIST mailing list