9.342, Disc: NLP and Syntax

Sat Mar 7 17:34:18 UTC 1998

LINGUIST List:  Vol-9-342. Sat Mar 7 1998. ISSN: 1068-4875.

Subject: 9.342, Disc: NLP and Syntax

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Editors:  	    Brett Churchill <brett at linguistlist.org>
		    Martin Jacobsen <marty at linguistlist.org>
		    Elaine Halleck <elaine at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Julie Wilson <julie at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:   Tue, 3 Mar 1998 11:15:11 -1000
From:  "Philip A. Bralich, Ph.D." <bralich at hawaii.edu>
Subject:  Re: 9.310, Disc: NLP and Syntax

2)
Date:   Thu, 5 Mar 1998 09:57:36 -1000
From:  "Philip A. Bralich, Ph.D." <bralich at hawaii.edu>
Subject:  Re: 9.328, Disc: Natural Language Processing and Syntax

-------------------------------- Message 1 -------------------------------

Date:   Tue, 3 Mar 1998 11:15:11 -1000
From:  "Philip A. Bralich, Ph.D." <bralich at hawaii.edu>
Subject:  Re: 9.310, Disc: NLP and Syntax

Sam Bayer wrote,

>Relative to the Derek Bickerton/Philip Bralich parser adequacy
>criteria: the field of computational linguistics has spent quite a
>number of years developing evaluation criteria for parsers, which I
>recommend looking at before you start reinventing the wheel. See the
>journal Computational Linguistics for the last five or six years or
>so, or for a summary, you can read the chapter that my coworkers and
>I wrote on comparing the theoretical and corpus-based computational
>enterprises, in a book edited by John Lawler called Computers and
>Linguistics, due out in April.

Yet his organization as well as all others have yet to be able to
create a BracketDoctor--a device that generates trees and labeled
brackets in the accepted style for this industry, the style of the
Penn Treebank II guidelines, or a MemoMaster, a device that increases
by many thousands the number of commands that are possible for a
speech rec navigation and control system.  In addition, the standards
that I propose are largely based on functionality that most people
have assumed that parsers and theories of syntax had handled years
ago.  The standards I have proposed are meant to demonstrate that
there is a serious problem in the field; that is, very basic levels of
functionality (see below) that many believed had already been
achieved, simply have not been reached.  Standards proposed by MUC,
Computational Linguistics and so on do not address the fact that they
are not requiring their members to meet those very basic standards.
Rather there is an end-run around that expectation that takes one to a
world of non-existent parsers that are meeting standards that are "out
there somewhere" among five or six years of publications (no dates or
page numbers), and an unpublished book.  How can standards of any sort
be of value if the ones I have proposed (reprinted below) have not yet
been met.  These basic levels of functionality should be met long
before anyone attempts comparisons or evaluations of different
systems.  If you just look at them you will see they make a good
qualifying round from which to begin the discussion of "mature
parsers" and "mature theories of syntax."

Before reading the standards I propose below, please note that there
is no list of standards given in Mr. Bayer's letter, just a reference
to standards that have supposedly been dispersed "somewhere"
throughout five or six years of this publication or in a yet to be
published book.  There is also no reference to working parsers that
could meet any standards therein proposed and certainly no discussion
of parsers that could actually meet the standards I have proposed.
Instead of paging through those many periodicals, ask yourself one
question. "Why isn't there a single reference to a list of standards
in this field or from this journal such as proposed below?"  The
answer I think is that there is a painful awareness on the part of the
computational linguistics community of their lack of success after 35
years in which millions of dollars and hundreds maybe thousands of man
years have been invested.  Cerainly, there is not the pride that I
feel in asserting in black and white in a list appended to this
message exactly what is possible in this field, and the value it will
have not only for my company but for creating jobs and projects for
students and linguists in this area for many years to come--not only
in English, but in all the langauges of the world.  It is not wise for
the field overall to shun the one parser that shows any promise at all
of making good on 35 years of empty promises.  All that does is
guarantee that in the long run the jobs, the projects, the profits,
and so on will all belong to Ergo.  I would like to see this company
contiue to profit in this field certainly, but I will feel guilty if
the entire field of computational linguistics tosses the whole of the
jobs and projects into our laps simply because they were unwilling to
admit they cannot meet the standards we propose.

Let's not pretend that vagueness, obscurity, and assertions of the
self-evident wisdom of certain propositions does anything other than
indicate failure on the speakers part.  Put the evidence in black and
white on a one page sheet and send it to this list, if you want to
gain any credibility at all with the readers. All I am asking is a
list, similar in the size to the one I have proposed based on your
expertise in this field.  Show the readers of this list that there are
indeed standards in the industry and people are actually meeting them,
and that my efforts are truly unnecessary.  However, I doubt that
there will be any such standards posted or any mention of parsers
(other than Ergo's) that are meeting our or others' standards.  I also
doubt anyone expects the slightest eveidence to be forthcoming from
anyone besides Ergo.  Or better still show that there are parsers that
exist that meet and then go beyond the standards I have proposed or
those standards that already exist spread out somewhere over the pages
of five or six years of Computational Linguistics.  Do not insult the
readers intelligence with a pretense to standards that exist somewhere
across six years of that journal or in a yet to be published book (the
web site shows little promise of standards actually existing there).
Just copy them from one of the published reports you are aware of and
print them for all to see along with the names and locations of the
parsers that conform to them.

Phil Bralich
Ergo's tools and downloadable products can be found at:
http://www.ergo-ling.com
The standards described here are on-line at
http://www.vrml.org/WorkingGroups/NLP-ANIM

THE STANDARDS: See LINGUIST 9.305, Message 3

-------------------------------- Message 2 -------------------------------

Date:   Thu, 5 Mar 1998 09:57:36 -1000
From:  "Philip A. Bralich, Ph.D." <bralich at hawaii.edu>
Subject:  Re: 9.328, Disc: Natural Language Processing and Syntax

On Tue, 03 Mar 1998 14:57:46 +0100, Pius ten Hacken
<tenhacken at ubaclu.unibas.ch> wrote:

>A syntactic theory is part of linguistics as an empirical science.
>Empirical sciences are concerned with explaining chosen aspects of a
>domain of observations in the real world. In the case of linguistics,
>the domain is natural language, but different aspects of language can
>be chosen as a goal for explanation, e.g. acquisition in Chomskyan
>linguistics, processing in LFG. The success of a linguistic theory
>depends on the degree to which an explanatory account is reached. The
>fact that different linguistic theories take different questions
>about language as a basis for research implies that in some cases a
>common ground for evaluation is missing. The following article gives
>an analysis of a number of theories along these lines and of the type
>of misunderstanding occurring when adherents of different theories
>are in discussion:

This is quite true, but how can you actually study acquisition or
processing if you have not yet properly isolated or described what is
that is being acquired or processed?  You cannot talk about the
acquisition of langauge or the processing of langauge if you are still
not able to properly describe parts of speech, parts of the sentence,
statements, questions, subjects, and so on and the relationships
between them.  You simply have not completed the preliminaries.  I am
not saying that syntax is all of linguistics any more than I am saying
that the isolating the periodic table is the whole of chemistry, but
without the work being substantially completed on those basics neither
field is really ready to begin.  Any theory that attempts to study
these "domains of observation in the real world" must first
demonstrate that it has at least isolated and described these domains
in some tangible sense.  My main argument is that the entire field is
starting off at the wrong end of the stick much the way alchemy did
before the building blocks of chemistry were properly isolated and
described.  Acquisition and processing require a thorough
understanding of syntax before they can be begun.  Certainly, the
alchemists had a right to study and hypothesize, but chemistry was not
able to evolve out of the superstious until a proper undestanding of
the building blocks of chemistry and the relationships between them
was done.  The standards that I propose merely seek to isolated and
describe the building blocks of syntax that anyone must relate to
before they begin any study such as acquistion and processing.

>ten Hacken, Pius (1997), 'Progress and Incommensurability in
>Linguistics', Beitraege zur Geschichte der Sprachwissenschaft
>7:287-310.

>Instead of addressing the explanatory nature of linguistic theories,
>Bralich only considers their descriptive qualities. In this way he
>neglects the purpose of linguistic theories, so that he has no right
>to judge them on these standards.

My point above is relevant here.  I do restrict myself to basic
descriptions, but what I am saying is that until these basics have
been accounted for no theory of syntax is complete or "mature" and no
other areas of inquiry such can say it is properly grounded in the
basics (the periodic table) of the field.  Simply put , no one is
ready to do the work of many ohter areas.  until these basics have
been handled to a large degree.  There is an tacit assumption on the
part of researchers that they are basing their work on a knowledge of
what a sentence or a subject is, but as I have shown very little has
actually been achieved in this area.  Acquistion studies first must
describe what it is that is being acquired and must do so in at least
the minimal degree that I have described in those standards.

>Independently of any evaluation of a syntactic theory qua theory of
>linguistics, we might consider their usefulness in Computational
>Linguistics (CL). Since the application in CL is not an aspect chosen
>by any of the major theories of linguistics, the results of such an
>evaluation do not affect the extent to which a theory reaches its
>explanatory goal. Of course the evaluation is relevant to CL, but
>only in the sense that it is practical to have an applicable theory,
>not in the sense that linguists do not do their job properly
>otherwise.

Yes, I agree with that; however, that is not really the basis of my
proposal to use CL as an independent objective means of evaluating
theories of syntax.  My point instead is this, if we are to study
anything at all of langauge (acquisition, processing, and so on) we
must first have a proper description of what langauge is in its most
basic form; otherwise, it is possible to say virtually anything
without being held to account for it.  Thus, we need some independent,
objective means of evaluating whether or not the theory of syntax on
which the theories of acquisition and processing are based has indeed
demonstrated that it has accomplished the basic task of isolating and
describing the these primes of langauge.  Then, based on the fact that
every theoretical mechanism ever proposed and every theory every
proposed in principle can be implemented in a programming languge,
then CL becomes a very natural and a very useful tool to check the
theory of syntax to see if indeed the theory and its theoretical
mechanisms can account for the facts of structure of a particular
language.

>By the way, the evaluation criteria Bralich proposes look rather like
>a design specification to me. They depend on a (to my mind) highly
>specific, not so straightforward analysis of the parsing problem. If
>there is an underlying theory for the choice of specifications I
>would expect it to generate a corresponding set of specifications
>for, say, French.

Please look closely at those standards (repated below), they are based
on what any one might assume a theory of syntax should be able to
describe: parts of speech, parts of a sentence, statements, questions,
actives, passives, tenses, internal clauses, and so on.  And they are
based on what anyone would assume demonstrates that a proper, minimal
description of what lanaguage is (from a structural point of view) has
taken place and thus that particular theory can serve as a valid
ground for studies of acquisition, processing, and so on.  And if the
theoretical mechanisms can in principle be implemented in a
programming language then that implementation serves as a
demonstration of that theories maturity and its readiness for use in
creating a theory of acquisition or processing.  Any theory that
cannot be so programmed would not be a valid basis for work in these
other domains.

The set of specifications would of course remain very much the same
for other langauges, but of course language specific idiosyncracies
would have to be added to demonstrate that the theory could indeed
handle the problems of that language.  Ergative langauges, for
example, would require some specialized treatment.

>At any rate it does not seem good scientific practice to me to apply
>one's own design specifications as evaluation criteria for competing
>products without stating so explicitly.

Actually, the standards came first and the programs later.  That is,
the standards were decided on as the minimum set necessary to
demonstrate that our theory of syntax had indeed accomplished the
basic description that is required before going further. The standards
were chosen to help a wide audience understand what syntax and NLP
are.  In deciding on these, I looked very closely at what is the basic
domain of description that would demonstrate to everyone
(syntacticians, programmers, marketers, linguists without a background
in syntax, and so on) that a proper description of English had been
done.  Here again, I can only ask the reader to take a close look at
the standards and ask him/herself if it really is alright to excuse
any theory of syntax (or parser) from this very basic level of
description.  I have so far received no comments as to gaps or
excesses in the standards. The only comment to date has been that they
seem to be specific to our abilities, but I believe that a close look
will reveal they are the basics that anyone must handle before
preceding on to other areas of investigation.

Phil Bralich
Download BracketDoctor
or see demo at          http://www.ergo-ling.com
See standards at        http://www.vrml.org/WorkingGroups/NLP-ANIM

THE STANDARDS: See LINGUIST 9.305, Message 3

---------------------------------------------------------------------------
LINGUIST List: Vol-9-342