Prevailing approaches do not have a computational lexicon

Thu Oct 10 07:33:27 UTC 2002

Hi Mark,

>
Hi Carl,

Is the problem as follows?  Suppose a substring is ambiguous; it has one
analysis as an NP ACC, and another completely different one as an NP
DAT.  In a type-logical system or similar in which the feature structure
logic (specifically, disjunction and conjunction) is tightly integrated
with the c-structure (so to speak) so that there's only really one
logic, one would then be able to prove that the string also derives the
"over-specified" category NP DAT /\ ACC.  But unfortunately ambiguous
strings don't behave like the lexically neutral forms "Frauen", showing
that the "/\" cannot be regular logical conjunction.  (But it still
could be a different binary operator in a multi-modal logic, couldn't it?)
>>

Let me restate the problem. Suppose you are using a logic of syntactic
types with type constructors \meet and \join such that the "prosodic
interpretation" of these constructors is intersection and union in a
set of possible prosodic entities (usually this set is taken to be a
free monoid whose generators are thought of as prosodic words, but
this is inessential). Suppose also that you try to distinguish
ambiguity from neutrality/syncretism in the following way: your way of
saying that the prosodic entity foo is ambiguous between being an A
and being a B is to give two lexical entries <foo, A> and <foo, B>;
whereas your way of saying that foo is neutral between being an A and
being a B is to give one lexical entry <foo, A \meet B>.

The problem is that these have the same prosodic interpretation, namely
that foo is in the intersection X \intersect Y where X is the prosodic
interpretation of A and Y is the prosodic interpretation of B.

The Whitman/Morrill solution is to say that in the case of ambiguity,
they weren't really both foo after all, but one was foo1 and the other
foo2, that is, the identity criterion for prosodic structures is
stronger than mere homophony.

>
The reason why I think that LFG and R-LFG don't suffer from this is that
each distinct c-structure is associated with its own f-structure; an
f-structure is formed from the constraints from exactly one c-structure,
so this merging of features from different c-structures simply cannot occur.

Have I got it right (more or less)?
>>

That depends how you analyze neutrality. How does that work in R-LFG?
If I remember right how Mary and Ron did it, if you had a homophonous
pronoun that was ambiguous (but not neutral) between nominative and
accusative, then the lexical entries would have distinct F-structures
with different CASE values, but if you had a pronoun that was neutral
between accusative and genitive, then there would be just one lexical
entry whose F-structure was a set (or maybe just its case value is a
set). That sounds pretty close to what you said above, doesn't it?

>
in fact I think that LFG
and R-LFG don't suffer from it precisely because c(onstituent)-structure
isn't integrated into the logic of features (i.e., there isn't a single
"logic" of all of LFG, but instead it consists of a heterogenous
collection of different but coupled "logics").
>>

That sounds right too. In standard TLG the problem is that the
prosodic logic is too much like the syntactic logic: they are
connected by an algebraic homomorphism (it is a residuated lattice
homomorphism, where the two residuated lattices are the lindenbaum
algebra of the type logic (the domain) and the powerset of a free
monoid (the codomain).

The original critique of this setup goes all the way back to a 1961
paper by Haskell Curry. His view was essentially that Lambek was
conflating two levels of structure that Curry thought should be
clearly distinguished: what he called phenogrammar and tectogrammar.
David Dowty borrowed these terms into his version of categorial
grammar in his paper given at the 1989 Tilburg conference on
discontinuous constituency, and I think both Mike Reape and Andreas
Kathol also used these terms in their systems (Mike's was more like
CG, Andreas' a kind of HPSG). I remember suggesting to Ron once that
the distinction in LFG between c-structure and f-structure was similar
to Curry's distinction, but as I recall he didn't think so.

Sometimes you see favorable citations of Curry's paper in the TLG
literature, as if he were an advocate of something like TLG, but I
think that is a misreading. However, Curry's own proposal still used
type theory for the tectogrammar -- traditional (i.e. Curry's) type
theory of course, not Lambek's.

There's a simple way to apply these ideas to get a type theory that is
reminiscent of LFG. I'd like to go back and look at R-LFG and see if
it is similar in this respect. The basic idea is that the analog
of TLG's functional types are types of the form

   [F1 [], ... Fn []] => [G1 [], ..., Gn []]

where the Fi are things like SUBJ and the Gj are things like CASE. The
things on both sides of the arrow are labelled products (i.e. static
record types) and the arrow is the standard cartesian (not linear)
exponential (intuitionistic implication under the Curry-Howard
isomorphism, so grammatical functions are contravariant but inherent
features are covariant). These formulas are types; the analogs of
actual f-structures are terms (proof encodings) of these types.  You
also bring in coproduct for disjunctive selection, and a primitive
Bool type, for a type-logical analog of "set values" (that is, the
analog of "set of A's" is the powertype Pow(A) \def= A => Bool). The
resulting logic is essentially Lambek and Scott's (1986) higher-order
intutitionistic logic. (So the models are toposes.)

This kind of type logic has all lambda-definable subtypes, so
in particular analogs of intersection and union are definable
(but the powertypes are only heyting algebras, not boolean unless
you add a boolean axiom). In this setup, a term that is ambiguous
between A and B is of type A x B, but a term that is neutral between
A and B is of type A \intersect B (the intersection is actually
defined as a subtype of the coproduct A + B).  A term that
selects ambiguously for A or B has an implicative type whose
antecdent is Pow(A) + Pow(B) (coproduct of powertypes); whereas
a term that selects neutrally for A or B has a type whose
antecedent is Pow(A + B) (powertype of coproducts).

I suspect what Mary and Ron did is closely related to what I just
described, implemented in sets, but the logic of it is much easier to
grasp if you use the => to divide f-structures into their covariant
and contravariant parts.

So the tectogrammar is based on a type logic, but not a resource
sensitive one. The appearance of resource sensitivity comes from the
way the grammar (think of something like a Montague grammar or a CCG)
specifies which triples <p, s, m> are "in", which is a kind of
labelled deductive system.  Here, as expected, p is a prosodic entity
and m is a meaning (or a term in a semantic lambda calculus); however
s is not a syntactic type, but rather a syntactic term.

Carl