Re LFG formal issues (was Ignore previous)

Wed May 16 00:09:50 UTC 2001

Hi Carl,

  I'm finally getting back to some of the formal questions you raised
when I asserted that I intended sets in LFG to be interpreted as
ordinary set-theory sets.  The trail is perhaps a little cold, but your
questions get at some interesting differences between the enterprise you
are engaged in and what we've been trying to do in LFG.  I hope the
following response highlights some of the issues without getting too
technical.  First there is a somewhat long preamble, followed by a
consideration of what you suggest are potential inconsistencies in the
way sets are used in the Dalrymple/Kaplan paper.

You wrote:
    I think in formal theorization about ANYTHING, one ALWAYS tries
    to use the simplest math possible - you only go to harder math if
    you can't get the easier math to work. Conversely, if you see how
    to get easier math to work, you go for it. The latter, in my
    opinion (which Drew Moshier is sure to disagree with) is
    exemplified by HPSG moving from an early 1980's `partial
    information' `unification-based' formalism based on Scott domains
    (Pereira and Shieber, Moshier, Rounds and Kasper) and
    intuitionistic logic to the mid-to-late 1980's `total-object'
    `constraint-based' formalism based essentially on with partial
    algebras (Smolka, Johnson, King). ) If LFGs were written in a
    model-theoretically interpreted formal language (maybe they have
    been and I just didn't know), then maybe it would be appropriate to
    say that this is what LFG was doing all along (but, given the way
    that minimal models come into LFG, maybe not, since this requires
    essentially thinking of f-structures with different degrees of
    instantiation being ordered by subsumption, which fits more with
    the unification-based, intuitionistic setup).

Actually, the original LFG theory as presented in Kaplan/Bresnan 82 was
really quite different from early HPSG in exactly this way.  LFG was in
fact written in a model-theoretic formal language (albeit one mostly of
my own devising), it was not constraint or unification-based (I've never
used the word unification in descriptions of LFG--in the 40 page
Dalrymple/Kaplan paper, the word only appears once or twice, and then
only in descriptions of other theories), and it did indeed have a model
space of 'total objects', not the partial objects of unification
theories.  So the LFG formalism did not have an operational,
unification-like semantics that would indicate how to build up objects
as you get more information about them.  The idea, rather, was that the
set of formulas derived from the c-structure could be used to test
whether a given, completely specified f-structure, was or was not a
model (I think I used the word "solution" instead of "model") of those
formulas.  I pointed out that the f-structure formalism (the
f-description language) was a variant of the quantifier-free theory of
equality, a fact that Mark Johnson investigated in much greater detail
in his thesis.  I also gave a satisfiability algorithm based on the
congruence closure algorithm of the QFTE--that algorithm resembles the
unification operation of other theories, but in LFG it was used only as
an implementation method and not to assign meaning to the formalism
(that was done by the conditions of satisfaction).

Historical digression, perhaps of interest to some:

Satisfiability of total structures was the main notion in LFG, and
historically, this was a key difference between LFG and theories that
evolved from Martin Kay's Functional Unification Grammar (FUG), a
related but formally quite different tradition.  I have always thought
of HPSG as evolving from the approach that Martin was advocating in the
late 70's, but maybe there was no direct relation.   In the early 70's I
had come up with the idea of using hierarchical AV structures as
syntactic representations, and Martin and I then had great debates about
the best way of characterizing such structures.  We had come from the
computational tradition of the ATN framework, where the order in which
operations were executed would determine the end result.  But that
seemed linguistically incorrect:  there were a number of constructions
that were quite difficult to handle--I remember passive tag questions as
a particularly good (or bad) example.  And the order-dependence of the
result also made it difficult to compute efficiently and correctly when
the most reliable information in a sentence (or in a speech signal) was
found towards the end instead of at the beginning.  With an
order-dependent system it was difficult to move the computation outwards
from so-called "islands of reliability" without inventing complicated
simulations of the correct order computed in the wrong way.

So these (linguistic and computational) were two quite different
motivations for trying to come up with an order-free way of
characterizing attribute-value structures.   A third motivation had to
do with psycholinguistic modeling--wanting to allow the computation to
be ordered by plausible psycholinguistic heuristics while still
guaranteeing the correctness of the result.  After fiddling for several
years, Martin and I came up with different intuitions about how to
proceed.  Martin came up with the unification idea, noting that it
solved the order-dependence problem because unification was a
commutative and associative operation.  He also noticed that there was a
similarity between the description system (the grammar) and the objects
being described (the av matrices)--you could present the descriptions in
the form of partial av matrices and use unification of such matrices to
get a matrix that described the structures satisfying conjunctions of
the original descriptions.  So he had an underlying idea of descriptions
as distinct from structures/models, but then argued that there was a
redundancy in having both descriptions and objects around to work with.
He followed Occam's advice and cut out one of them--the objects--as
being unnecessary.  So FUG really was all about operations on partial
descriptions, and the interpretation of the FUG formalism was to be
found in the definition of the unification operation.

I followed a different course in developing what eventually became the
LFG formalism.  Still looking for an order-free system for
characterizing av structures, I decided to remove order and computation
(or the incremental accumulation of information) completely from the
picture.    I took the noncomputational, more mathematical approach of
specifying a set of formulas and a set of structures and then defining
whether or not a given structure satisfies a given set of formulas.  And
despite Martin's observation that it was possible in principle to make
the constraints resemble the structures, I went the other way.  I made
the constraints and the structures they describe look very very
different.  The constraints were formulas in an algebraic notation
(things like (f A)=B), and the structures were finite tabular functions,
represented as av matrices  (things like [A B]).  This helped to
avoid lots of confusion about which was which, and it also made it easy
to include descriptive devices that don't have any analog in the
structure space (disjunction and negation being two obvious devices that
can be represented as funny objects only if you assign a special
interpretation to the objects as Goedelizations of the formulas, in
which case you have to figure out new, special subroutines of
unification to get the computation to come out right).  In a sense, I
took a different view of Occam's razor: the admonition is to not
multiply entities *beyond necessity"--and I thought it was necessary to
distinguish descriptions from structures in order to understand what was
going on.

My impression as an outside observer is that HPSG started out more along
the lines of FUG, and that the current model-theoretic understanding was
developed over a period of time by Kasper and Rounds, Smolka, King, and
others.  It seems that the grammar itself is still presented in the form
of (heavily annotated) attribute-value matrices, but these are clearly
distinguished from the underlying models which are defined as graph
structures.  This makes for some confusion in moving back and forth
between the frameworks:  the equations in LFG correspond to the av
matrices of HPSG, while the av matrices of LFG correspond to the HPSG
graphs.  There also seems to be a difference in practice, again from the
perspective of an outside observer:  In LFG linguistic papers you will
always see both equations and structures and the discussion will be
about the satisfaction relation.  My sense of the HPSG literature is
that the graph models are not usually exhibited in linguistic papers and
that the linguistic discussion and results are expressed in av diagrams
(= descriptions); the graphs only show up in introductions to the
formalism or in more formally oriented work and don't provide much
linguistic leverage.

End of historical digression.

So the LFG formalism as presented in KB82 really did have a
model-theory, total structure view of the world.  But I did not try to
belabor the formal set up in that paper, and these properties of the
theory might not be apparent from a casual reading by someone who is
really looking for a succinct statement of the wffs and their
interpretation.  The basic concepts are all there in a more discursive
form that was hopefully more accessible to the intended audience of
linguistics, computer scientists, and psychologists.  Indeed, Johnson in
his thesis and later Smolka in a personal communication both
acknowledged the resemblance of the original LFG formalism to the
mathematical set ups that they later presented.  (There was an earlier
draft of KB82 that took a much more mathematical slant and built up
everything from first principles, but it really was tedious and it was
soon
thrown out.)

If you look at KB82, though, you will see quite early on a specification
of the co-domain of structures (example (4) and the discussion around
it):  the f-structures were taken to be monadic, finite functions from
symbols into values that were drawn from symbols, semantic forms,
f-structures, and (simple) sets of such values.  The description
formulas were equalities (and later in the paper, set-membership
assertions) between designators consisting of constants and function
applications expressed in Cambridge Polish notation (conventional f(a)
is written as (f a)).  And the interpretation was pretty clear too, even
though a little informal.  A designator (f a) denoted the value of
attribute a in the f-structure denoted by the variable f, and (f a)=v
holds iff (f a) and v both denote the same object.  And so on.

You (Carl) wondered about the fact that I also made use of an ordering
relation on f-structures as a way of picking out the minimal model.   I
don't think this puts us back in the unification camp.  I was not trying
to deal with growth of models as you accumulate information.  Rather, I
was addressing the fact that there could be many solutions (infinitely
many, in fact) to any set of formulas over the monadic functions.  If a
given candidate function satisfied a description, then any extension of
that candidate would also be a model (that is, any structure with a
larger domain whose values were, by hypothesis, not constrained by the
description).  These larger models contained no linguistically specified
information and I thought they should therefore be defined out of
linguistic consideration--so I specified the smallest one as the
linguistically relevant structure.  (Picking the minimal model in LFG is
equivalent to the notion of "fully described" in the Construction
Grammar formalism--expresses the pretheoretic intuition that the
linguistically relevant model shouldn't have more stuff in it than the
description called for.)

Some confusion might arise (and has arisen) when the formal conceptions
of other approaches (earlier versions of HPSG, but perhaps not modern
versions) have been applied to the LFG situation.  I have had
discussions where people wonder what would happen if you added another
constraint--how would the model grow?  The answer is, you can't and it
doesn't.  The presupposition is that you always are concerned with a
given set of formulas, known in advance and unchanging, and you can test
various candidate models against it.  Of the ones that satisfy it, you
pick the smallest one.

You might want to consider what happens as you take the f-descriptions
associated with larger and larger subtrees of the c-structure.  Suppose
you have the f-description for a given subtree, and you find the
smallest structures that correspond to the nodes and that satisfy the
description.  And now you go to a higher node whose subtree by
definition gives a description that at least contains the description
that you found in the smaller subtree.  You now consider the smallest
structures that satisfy the larger description.   It turns out that if
you were to look at the f-structures assigned to the nodes in the
original subtree under the original description and compare those to the
f-structures assigned to those same nodes by the larger description, it
is typically the case that the f-structures for the smaller description
subsume (or are functional restrictions of) the f-structures for the
larger description. This would be true, for example, if the description
devices are all monotonic.   But this still doesn't involve one
(partial) structure growing into another--there are just two sets of
complete structures that typically stand in a certain relation.

I hope that this picture helps to clarify the basic formal set up I
provided for LFG.  The goal wasn't necessarily to find an existing
model-theoretically interpreted formal language and then to embed the
LFG formalism inside that.  Rather, I was attempting (and as you'll see,
we are still attempting) to define a linguistically appropriate formal
language and then to give it its own precise model theoretic
interpretation.  To the extent that we can map onto an existing system
(such as QFTE or Shonfinkel-Bernays) we are very happy--much less work
to do.  But we are not especially interested in the enterprise, which I
gather is what motivates you, of discovering a system that is rich and
general enough so that all the special properties we want can be easily
defined and internalized.

I worry that a system that is rich enough to internalize all the special
things that we want to do (phrase structure recursion, monadic
functions, equality and set membership, functional uncertainty and
distribution over sets) will have have so much power that its
mathematical properties will be uninteresting.  And worse, if it allows
closure over all the different kinds of descriptive devices, I worry
that a large number of interesting questions will be undecidable, and
many important computational problems will be difficult to solve.   So,
instead of making available the full power of the typed lambda calculus,
arbitrary quantification, etc., we carefully define from the bottom up
just those devices that we think are linguistically justified, and we
specify very particular ways in which those devices can interact.  And
wherever possible we try to take advantage of the fact that we are
introducing only special cases of more general mechanisms.  Thus you'll
see that the LFG formalism does not have lambda abstraction, does not
have general operations on sets or sequences (e.g. you can't specify the
intersection, union, append, difference--only membership (and in D&K)
subset), has only nonconstructive negation, and has only implicit
existential and universal quantification.  These facts are relevant to
the more particular set of questions that you asked.

 Back to your message...

You asked whether I still "intend that LFG sets would be, literally, the
simple sets of ordinary ZF set theory".  To suggest that that might not
*really*  be the case, you pointed out what look like some
inconsistencies in the framework that Mary and I presented in our
Language
paper.

Well, the answer is Yes, I really still do intend the underlying models
to be simple sets.  In my mind, the issues you are raising don't have to
do with whether or not the underlying objects are real sets, but rather
what the proper interpretation is for the descriptive devices we have
proposed for characterizing the sets.

You gave two examples of set descriptions that you suggest may be
inconsistent.  Supposing for the moment that these are true
inconsistencies, there is no general principle for inferring from that
fact anything about what may or may not exist in the underlying models.
To take a trivial example, from the equations x=1 and x=2 I can infer
the inconsistency 1=2, but that doesn't make me deny the existence of
the natural numbers.  Instead I would conclude that this particular
description doesn't describe a natural number.  And that would be
perfectly fine, unless I had an elaborate theory that generated these
particular equations as an account of some particular phenomena I was
trying to model.  This would be a strike against my theory, but again,
not necessarily against the natural numbers.  I would be challenged to
look for more appropriate descriptive devices over the natural numbers
that give better models of the phenomena.

In fact, however, I don't think your particular examples give rise to
inconsistencies and thus don't pose new challenges for the way that we
have (carefully) set things up.  The descriptions that you have
constructed do not support the inferences that you draw, and the
analyses of indeterminacy and resolution that Mary and I outlined go
through as we claimed, with only simple ZF sets in the LFG ontology.

Your first example:
    On p. 770 (discussion around (45) and (46)) the
    logic seems to be inconsistent if the things you call sets are
    LITERALLY sets. For let P be the property

         P \def=n lambda x ((w CASE) = x).

    Then for s = {ACC,GEN}, you would have

        P(s) iff \forall f \in S. P(f).

    i.e.

        (w CASE) = {ACC,GEN} iff [(w CASE) = ACC) \land (w CASE =
    GEN)]

    But the left-hand side of the latter biconditional is true
    while the right-hand side is false. This made me wonder whether
    there is something going on with `LFG sets' that makes them unlike
    ZF sets. Or maybe the lambda-term I used is not an acceptable LFG
    property? (This issue would be clarified if LFG's were actually
    written in lambda calculus.)

My reply:
This brings out a couple of issues.  First, the lambda term you wrote is
indeed not an acceptable LFG property.  In fact, the LFG formalism
doesn't provide any mechanism at all for a grammar writer/linguist to
construct properties other than the ones that are pre-defined.  We may
use
lambda abstraction or other conventions in our metalanguage to explain
how a built-in operator or notation works, but we don't provide lambdas
or other devices that would allow new kinds of properties to be created
inside the formalism.  This is just a restatement of the point made
above, that we aren't after a formalism rich enough to internalize
constraints and properties beyond the ones that we (as formalism
designers) have carefully thought out.  Our characterization of the
distribution scheme in (44) (extended slightly in (73)) may have led
you astray, since it is phrased abstractly in terms of "any property P".
 But the only P's are the ones we provide, namely, ones built up out of
function application expressions.

But let's suppose you want to take on the role of formalism designer and
introduce the peculiar property P above, thereby suggesting that it has
some real linguistic utility.  That would still not lead to the
putative inconsistency that concerns you.  You defined the property P as
above, and then asked for the value of the expression P({ACC, GEN}).
At that point you invoked the distribution schema (44/73) to distribute
the
property across the elements of the set {ACC, GEN}, getting the false
statement (w CASE)=ACC \and (w CASE)=GEN, and you constrasted that with
the independent true fact in this context that (W CASE)={ACC, GEN}.

The flaw in the argument is that distribution doesn't apply in this
situation.  In the text above (44) and in Kaplan and Maxwell (88) and
elsewhere, we are pretty clear that the distribution schema is invoked
only when a property that normally applies to a function (f-structure)
is instead applied to a set of f-structures.  Distribution thus is a way
of assigning a linguistically useful interpretation to propositions that
would otherwise always evaluate to False.  Of course, the properties
that normally apply to functions are just those that involve extracting
the value of an attribute/argument--what else can you do with a
function?  Your property P does not involve extracting the value of an
attribute from its operand x, hence distribution is irrelevant whether
or not x denotes a set.

To put this in more formal terms, I'll use [[ ]] to pick out the
denotation of an expression.  The normal interpretation of a
function-application equality (f a)=v is given by
	(f a)=v  iff   [[f]] is an f-structure, [[a]] is a symbol and the
		pair <[[a]],[[v]]> is in f.
On this interpretation, (f a)=v would be false if [[f]] is a set and not
an f-structure.  The extension for distribution, schematized in (44) and
(73), gives an extended definition:
	(f a)=v  iff
		If  [[f]] is an f-structure and [[a]] is a symbol,    (normal)
			then the pair <[[a]], [[v]]> is in f.
		If [[f]] is a set and [[a]] is a symbol,
(distribution)
			then for all g in [[f]],  (g a)=v.
		Otherwise False.
The crucial point is that distribution only applies when attempting to
obtain the value of an attribute from a set, when attempting to treat a
set as a function.   It doesn't apply in your first example.  (Later, in
(73) we restrict this further so that something else happens for
attributes that are declared to be "nondistributive".)

Your first example thus fails because you applied distribution when it
was
not warranted.  The second example fails for the opposite reason:  you
didn't apply distribution when it was necessary.  This is more
interesting, since here you are using specifications that are important
to the linguistic account that we are presenting.  If we had an internal
inconsistency, then we would have to revise something (you suggest maybe
giving up simple sets, I probably would look for a better way of
describing them).  Fortunately, I think that neither of these moves is
necessary.

You wrote:
    Likewise, on p. 772 (discussion around (55) and (56)) the logic
    again seems inconsistent if what you are talking about are real
    set-theoretic sets. We have

       (f SUBJ CLASS) \in {7/8,9/10}.

    So by (35) (p. 767)

       either (f SUBJ CLASS) = 7/8 or  (f SUBJ CLASS) = 9/10

The argument breaks down right here.  (35) is an observation
(essentially an instance of a standard set-theory axiom) we displayed to
explain why using set representations aligns with some of the
disjunctive intuitions that come along with indeterminate values,  We
pointed out that if x stands for some unknown value, then
	(35)  x \in {NOM, ACC} => NOM =x \lor ACC=x
You applied this general principle to launch your argument, but you
didn't apply it to some unknown value.  You applied it to the expression
(f SUBJ CLASS), on the assumption (not unreasonable in most other
systems) that (f SUBJ CLASS) was as good a designator as anything for
some particular but unknown value, and that you could therefore simply
take the initial CLASS membership statement as an instance of the
antecedent proposition in (35), letting the expression (f SUBJ CLASS)
stand for the particular unknown that it denotes.

But in this context the expression (f SUBJ CLASS) doesn't denote
anything.  Therefore the antecedent in (35) does not hold, you therefore
can't infer the disjunction, and therefore you don't arrive at the
inconsistency.

Suppose we were to evaluate the expression (f SUBJ
CLASS) to find out what it denotes.  Well, the notation is
left-associative, so (f SUBJ CLASS) is equivalent to ((f SUBJ) CLASS),
and given the f-structure that f denotes in our example, (f SUBJ)
denotes the set s={e, h} where e and h are f-structures.  So we are
asking for the value of  {{e, h} CLASS), which means we are trying to
extract the value of an attribute from a set of f-structures.  That's
undefined--we can't do that.  So there is no particular value that can
be assigned to x to justify the substitution into (35).

What we can do, however, is determine the truth value of the proposition
(f SUBJ CLASS) \in {7/8, 9/10} using the interpretation scheme sketched
above, with set membership replacing equality.   From the distribution
clause we can infer that
	(e CLASS) \in {7/8, 9/10} \land (h CLASS) \in {7/8,9/10}
We've broken through the set barrier to the point that we have
expressions that do denote particular values, and now we can apply (35)
to each of these, determining that
	(e CLASS)=7/8 \or (e CLASS)=9/10    etc.

Thus we get the desired effect that the expansion to disjunction can
happen only AFTER the distribution, not before, and the source of your
inconsistency disappears.

You see that we approach these descriptive problems in a style that
differs from one that you might prefer.  We have preserved the simple
notation that we think is linguistically attractive and we have
preserved the simple models of ordinary sets.  What we have complicated
is the way that the notation is interpreted into the models.  An
alternative would have been to have a notation with universal
quantifiers and predicates that explicitly test for whether or not
things are sets or f-structures (we don't want the expressions/variables
to be statically typed as sets or f-structures, since our coordination
story crucially depends on a set sometimes showing up in places where an
f-structure usually appears).  Then instead of (f SUBJ CLASS) \in {7/8,
9/10}  (and for any other proposition where an attribute-value might end
up being extracted from a set), we would be able to write
   If (f SUBJ) is an f-structure then (f CLASS) \in  {7/8, 9/10}
   If (f SUBJ is a set, then for all g in (f SUBJ), (g CLASS) \in {7/8,
9/10}
and then the interpretation would be simple.  And it would be absolutely
clear, by inspection, where and when a deduction based on (35) could be
done.  But this clarity comes at a cost:  the notation would be
cumbersome and awkward to work with, and we think the system would be
much more expressive than we need or want.

I'll finish up with some brief replies to a few more of your remarks:

You wrote:
    I understand that the domains of these functions are sets of
    attribute names.  What are their codomains? .
As mentioned above, in KB82 I specified that the attribute values were
taken from symbols, semantic forms, f-structures, and the power set of
the foregoing.  Not much has changed since then.  There have been some
ideas about decomposing attributes into feature complexes and a few
other minor wiggles, but the ontology has remained essentially stable
for over 20 years.  But I don't want to make too much of domain
stability as a theoretical value--someone might point out that
transformational grammarians have been fixated on trees for almost 50
years!  What we have done over the years is extend the descriptive
notation slightly (e.g. introducing functional uncertainty), modify its
interpretation (e.g. the distribution schema), and work to understand
the mathematical, linguistic, and computational properties of the
original set up.

I had written:
    > I think several years ago, when Ivan and I were doing the
    LFG/HPSG comparison course, we had a discussion about one
    difference in the underlying models, noting that in LFG equivalent
    structures are considered to equal, whether or not they are
    asserted to be equal, whereas this is not true of HPSG models. >>

and you remarked:
    That is true, although I also recall from that discussion that
    there were non-public ways in LFG to make equivalent-looking
    structures unequal, essentially be `uniquefying' their semantic
    forms (i.e. the semantic forms are unequal but structurally
    identical).
For the record, the idea of instantiating semantic forms was really
quite public.  Three or 4 pages are devoted to this topic in KB82,
around example (105).

Your message went on to consider a number of additional formal questions
and to sketch a different way of organizing and conceptualizing things.
I may have some reactions to that, after I've had a chance to work on it
under the guidance of our in-house category theorist.  But now I think
I'll rest for awhile.

I hope I have been able to make clear some of the formal intuitions that
underlie the LFG framework, and I hope what I have described seems
reasonable and coherent even though you may have a different take on
some of the basic issues.  It is interesting and worthwhile, I think, to
get precise enough so that we can identify crucial similarities and
crucial differences between LFG and HPSG, and also to understand how
they relate to the new formalization that you are pursuing.

Regards,

Ron