Reply to Mayo, Sarawswat on linear logic power

Mon Jun 10 15:35:05 UTC 1996

Bruce and Vijay both commented on my original message, now a week or so old,
from a computer science perspective.  Here are some brief thoughts in reply...

Bruce said:

>        1: Arguments against powerful formalisms on the basis of their
>complexity/poor efficiency are misleading. Whatever formal representation
>you use as input to a model-building implementation (whether computer or
>brain simulation), it has to be passed through a compiler that, using lots
>of optimization techniques, can reduce the problem to something like its
>inherent complexity. E.g., a good compiler will turn a determinate finite
>automaton described in Prolog (based on a formalism that is O(exp n))  into
>a program that is merely O(n). (There has to be a compiler - I don't think
>anyone wants to say that the brain is actually a linear logic engine, do
>they?) The formalism is to help people understand the problem, not for
>actually doing the computation.

The last sentence here really bears on the point that I was trying to make,
that the formalism and, indeed, the whole theory, is aimed at helping people
understand the problem.  The question of whether the formalism is directly
interpreted or compiled into some simpler computational framework is a
separate (and also important) question that focusses on implementation
(computer or brain) rather than specification or characterization.  Here I
just want to point out that compilation is not always the best way to go.

Just from the engineering perspective, converting a nondeterministic
finite-state machine to a deterministic one, for example, may reduce the
time of processing but may also cause the size of the specification to blow
up exponentially.  There may be some circumstances when leaving it in
nondeterministic form is a much better thing to do.

Also, it is not always clear that the reduction from a powerful formalism to
a simpler implementation system always brings a benefit.  The reduction
typically loses the structure of the original problem, and that structure
may be a key to certain efficiencies.  For example, we know that all the
rules in a context-free grammar that are not involved in a self-recursion
can be "compiled" out into finite-state machines, with the phrase-structure
between self-recursions being flattened out. But given that some
context-free nondeterminism still remains, this simplification may remove
some of the hints that well-known parsing algorithms (e.g. chart parsing)
take advantage of to convert an exponential nondeterministic search into a
polynomial one.  By eliminating various levels of grammatical structure, we
are also eliminating the grammar's characterization of "natural" units of
substructure, those that are likely to be useful on search paths other than
the one that originally caused them to be discovered.  A direct
interpretation of the higher-level (CFG in this case) formalism can
recognize that certain units should be remembered in an "intelligent" cache,
so that processing time can be dramatically reduced at the expense of a
relatively small amount of on-line memory.

This is all by way of indicating that such questions about implementation,
whether for engineering purposes or for psycholinguistic modelling, do not
have obvious answers and require a sustained program of careful analysis and
experimentation (both computational and psychological). 

Going back to the question of whether linear logic should be used for other
aspects of linguistic description, I think Bruce is observing that there is
not always an implementation cost of using a formalism that is
inappropriately powerful.  This is certainly a valid point, and would be
helpful if we had other good reasons to use the more powerful formalism and
then wanted to overcome an objection based on implementation difficulties.
But by itself it is not an argument in favor of the more powerful
formalization, especially if a more restrictive one is already known.

Vijay Saraswat, in his reply, seemed to be making some points related to
Bruce's. At 08:06 AM 5/31/96 PDT, he wrote:

>Why can't we have our cake and eat it too?
>
>Have the specific structures (for syntax, semantics, inter-relationships) that
>seem to be the ``appropriate level'' for analysis. Different notation, even, is
>sometimes very useful in keeping different ideas apart. Have the
>``compilation'' into some underlying ``universal'' formalism like linear logic,
>for ease of implementation, whatever. Use whichever views gives you more
>insight into the matter at hand.
>
>Case closed. No?

Of course, if it is shown that there is some advantage to compiling into a
particular target formalism (Vijay seems to think of going up in power
whereas Bruce seems to think of going down), then that might be a good thing
to do.  But the mere fact that the target formalism is universal is not, at
least for some purposes, a good motivation.  It needs to be shown that the
universal translation does in fact give you more or different insight into
the matter at hand--that can't be taken for granted.

We may well be able to have our cake and eat our cake too--but it has to be
shown that the best cake for having is also the best for eating.

Bruce continued...

>        2: Arguments against powerful formalisms based on their
>representational vagueness seem to me, however, quite legitimate (I think
>this was Ron Kaplan's main point). The more powerful the formalism, the
>less it tells you about the thing formalized.

There may be some things that can only be described by very powerful
formalisms, in which case we are simply stuck with the vagueness.  The
mapping from f-structure to semantics, for example, may be one of them.  But
if we are asking why we don't replace a weaker (and by hypothesis, adequate)
formalism for other parts of the syntactic mapping by the more powerful one,
in the interests of uniformity.  I was trying to argue that uniformity by
itself is not a particularly good motivation, and it was not the major
motivation in defining the LFG architecture.

[In their messages, Avery and Mark Johnson both suggest an alternative
motivation for exploring this issue:  it may be that new formal devices,
whether or not they are more powerful than existing ones, will offer new and
better insights into what's going on.  That's certainly true and I think a
reason much better than power or uniformity for investigating new
applications of linear logic.] 

>         From the standpoint of building a working model of a natural
>language processor, I think the overall formal structure has to present
>a good match to the structure of the available linguistic data gathering
>procedures. In a natural language system your're going to have vast
>quantities of often arbitrary data - it's not for nothing that we spend
>years and years learning languages - and you've got to be able to check
>empirically available facts quickly and reliably against their formal
>representations. So if you can get good and reliable data about surface
>constituency in a language, context-free production rules give you a good
>shorthand for those data. If you have clear functional valency data,
>LFG-style argument structures can express these clearly. Keeping the formal
>power of each of these domain formalisms low helps the data gatherers avoid
>mistakes because they can't say things that wouldn't be meaningful in the
>domain.

Presumably, we ought to derive this benefit only to the extent that the
architecture of the theory and the phenomena it is intended to account for
are closely aligned.  Having a context-free component in the theory wouldn't
be particular useful if there weren't a corresponding set of dependencies as
an empirically isolable natural class.  And if the phenomena do naturally
divide, then it seems that this is something that the most explanatory
theory should give a formal account of.  And that feeds back to the
practical issue that Bruce has pointed out.   

>        As a help for the compiler writer, it would be good to know about
>the formal complexity of each of the descriptive domains, but it would be
>very troubling indeed if the complexity of the whole system - sentence
>surface structure to conceptual structure - turned out to be anything less
>than fully Turing-equivalent, would it not?

I don't think I would be "troubled" if this turned out to be the case.
Rather, I would be pleasantly surprised.

--Ron