[HPSG-L] relation between Formal Grammar of Human Language and CFG

Guy Edward Toh Emerson gete2 at cam.ac.uk
Fri Aug 12 18:50:54 UTC 2022


Dear all,

Sorry to be a little late to the discussion (I'm currently on parental leave and only occasionally replying to emails).

I wanted to add that Emily M. Bender and I discuss formalism vs. theory in our chapter in the HPSG Handbook -- Chapter 25 here: https://langsci-press.org/catalog/book/259

I think our comments sit well with what has been said in this email thread, but it might be helpful to connect them to Thomas's two reasons:

Reason 1 (explanations in the theory, not the formalism) -- from a practical point of view when working with large-scale HPSG grammars, this is helpful because software packages support the formalism, which is stable over time even as the theory develops.  So it becomes easier to write and maintain large-scale grammars.  We discuss this in section 2.1.

Reason 2 (computational complexity) -- in the general case, HPSG is Turing-complete, and so parsing is undecideable.  However, particular grammars can have better complexity guarantees.  Furthermore, average-case complexity can be much better than worst-case complexity.  In practice, an HPSG grammarian might choose to write a grammar so that parsing can be done more efficiently.  We discuss this in section 2.2.1.

Best,
Guy

-----Original Message-----
From: HPSG-L <hpsg-l-bounces at listserv.linguistlist.org> On Behalf Of Roussanka Loukanova
Sent: Saturday, August 6, 2022 12:30 PM
To: chris brew <cbrew at acm.org>; Thomas Graf <hpsg-l at thomasgraf.net>; hpsg-l at listserv.linguistlist.org
Subject: Re: [HPSG-L] relation between Formal Grammar of Human Language and CFG

Dear Chris, Dear Thomas,

Thank you for your excellent explanations, needed.

Best Regards,
Roussanka


On Sat, 6 Aug 2022 at 01:03, Thomas Graf <hpsg-l at thomasgraf.net> wrote:

> Dear Roussanka,
>
> What you're saying lines up closely with the split between formalism and
> theory that I had in mind.
>
> A formalism defines the general shape of structural atoms and gives you
> some recursive procedure for combining structures into larger
> structures. The theory defines the actual set of structural atoms for
> language and how exactly the recursive procedures are to be applied.
>
> Note that a theory need not correspond to a single grammar. For example,
> once you adopt a theory of parts-of-speech, then only a subclass of all
> possible CFGs are compatible with that theory. It's still a collection
> of grammars, though.
>
> Both formalism and theory thus pick out collections of grammars, the
> difference is that the formalism focuses on aspects that are content
> agnostic whereas linguistic theories tend to be highly content-specific
> (types of categories, features, markedness requirements or prominence
> hierarchies that restrict how features may be combined, linguistic
> distinctions like count nouns VS mass noun, and so on).
>
> >Now, you are giving a new life to these two notions, a formalism and a
> >theory, it seems to me, with respect to the questions about whether
> >computational syntax of human language is CFG or more demanding CSG.
>
> Actually, the point I wanted to make is largely orthogonal to how
> complex syntax actually is. It arises as soon as you believe that not
> every computable set of strings, trees, or graphs is a possible natural
> language. The question then is, where do you want to put the
> restrictions: the formalism, the theory, or both? That ultimately
> depends on your goals, and I merely wanted to point out that there are
> some very concrete reasons why one would want to put at least some
> things in the formalism.
>
> But since your original question is more about the string complexity of
> syntax, let me add some more obscure points that haven't been mentioned
> yet:
>
> 1) The extensionalist strawman: If we're talking about language as an
> observable object in our world, then every natural language is actually
> finite because i) there is an upper bound on the length of sentences
> that can be uttered before the heat death of the universe, and ii) at
> any given point until the heat death of the universe, we only have a
> finite number of speakers. Since all finite languages are regular,
> syntax is at the bottom of the Chomsky hierarchy (btw, there's many
> classes of languages that are properly weaker than the class of regular
> languages, they just weren't around when the Chomsky hierarchy was
> defined).
>
> 2) The bandname conundrum: If you reduce languages to sets of
> well-formed strings, you run into a peculiar problem. Every random
> string is a potential bandname, and every bandname is a possible
> sentence of English:
>
> Q: Which band are you listening too?
> A: [band name]
>
> All languages that permit A as a valid answer define the same stringset,
> \Sigma^*, which is the simplest non-empty formal language.
>
> 3) Since the extensionalist strawman and the bandname conundrum are
> rather useless perspectives on language, we have to make some additional
> assumptions. First, we factor out bandnames and similar things by
> treating them as atomic symbols akin to N. Second, natural languages are
> taken to be infinite objects. For example, we may allow arbitrarily many
> conjuncts within a coordination. In this case, syntax can still be
> regular. In order to go beyond regular, we have to assume that specific
> constructions are unbounded:
>
> a) center embedding in English and many other languages -> syntax is at
> least context-free
>
> b) crossing dependencies in Dutch (Huybregts 1984), Swiss German
> (Shieber 1985), or Tagalog (Maclachlan & Rambow 2002) -> at least Tree
> Adjoining Languages
>
> c) unbounded triplication in morphosyntax -> at least Multiple
> Context-Free Languages
>
> d) Case stacking in Old Georgian (Michaelis & Kracht 1997), relativized
> predicates in Yoruba (Kobele 2006) -> at least Parallel Multiple
> Context-Free Languages (this seems to be the end of the line, which is
> interesting because PMCFLs are the most complex formal languages that
> still have parsing algorithms that run in polynomial time)
>
> Are any of these constructions actually unbounded? No, not in terms of
> what speakers can produce or understand. But assuming unboundedness
> often allows for a more succinct description. CFGs provide a more
> compact description than finite-state automata, TAGs can handle many
> things more elegantly than CFGs, and so on. Savitch (1993) "Why it Might
> Pay to Assume that Languages are Infinite" is a nice discussion of why
> unboundedness can be a useful assumption even when it isn't true.
>
> But all of this means that there is no such thing as **the** string
> complexity of syntax. Complexity increases or decreases depending on
> which constructions you assume to be unbounded, and that choice should
> be based on the specific issues you are trying to explore.
>
> -- T
>
>
> On 2022/08/05  23:10, Roussanka Loukanova wrote:
> >Dear Thomas,
> >
> >To better understand the chiming, how do we determine:
> >
> >- a formalism
> >- a theory
> >- the relations between these
> >
> >For instance, what the two distinctions in / within  CCG and HPSG:
> >
> >HPSG the formalism / HPSG the theory
> >CCG the formalism / CCG the theory
> >
> >I vaguely recall a discussion about formalism vs theory in the list, some
> >time ago, without recalling the final output. Or was it about some other
> >related notion(s), in relation to a formalism?
> >
> >Now, you are giving a new life to these two notions, a formalism and a
> >theory, it seems to me, with respect to the questions about whether
> >computational syntax of human language is CFG or more demanding CSG.
> >
> >For instance, I can give my own interpretation of these two notions, e.g.,
> >within Chomsky hierarchy of formal grammars, in the following way:
> >
> >Regular Grammars (Finite State Automata) / CFG / CSG are classes of
> >grammars, three formalisms?
> >
> >These three classes, i.e., three formalisms, have their own specific
> >theories, represented by their corresponding, specific grammars, e.g.:
> >
> >A given CFG grammar Gi provides a specific CF theory, as a representative
> >of the CFG formalism, consisting of various components, e.g.:
> >(1) Gi - the set of nonterminals, terminals, rules
> >(2) Li = L(Gi) the language of the parse trees, which are generated, i.e.,
> >determined by Gi
> >(3) various properties of Li, etc.
> >
> >Best Regards,
> >Roussanka
> >
> >
> >On Fri, 5 Aug 2022 at 20:29, Thomas Graf <hpsg-l at thomasgraf.net> wrote:
> >
> >> Formal grammarian/computational linguist here who can't resist chiming
> >> in. Advance warning: I tried to keep this short, but it's a long read...
> >>
> >> >While CCG and TAG people will tell you that this is way too
> >> >powerful, HPSGians usually have a different view. The formalism
> >> >should not be constraining but the theory formulated within the
> >> >formalism has to be as restrictive as possible. Carl Pollard argued
> >> >for this in a paper.
> >>
> >> The reason CCG, TAG, MGs, and others worry about the "too powerful" part
> >> is actually two reasons in one. My impression is that the HPSG community
> >> tends to focus on the first reason when it is actually the second reason
> >> that motivates formal grammarians' search for restricted formalisms.
> >>
> >>
> >> Reason 1
> >> ========
> >>
> >> Linguistics is in the business of characterizing what separates
> >> the patterns we find in natural languages from arbitrary mathematical
> >> patterns, e.g. dependencies that involve the ability to distinguish
> >> prime numbers from numbers that aren't prime (this is within the class
> >> of recursively enumerable string languages).
> >>
> >> For this goal, it really does not matter whether one puts the
> >> restrictions in the formalism or the theory. For example, no language
> >> seems to have cowardly islands, i.e. cases where an island loses its
> >> island status unless there are at least n other islands in the sentence.
> >> "HPSG the formalism" could define such islands, but they would still
> >> look very odd within "HPSG the theory".
> >>
> >> Quite simply, what we want is explanations, and it's not really that
> >> important whether those come from the formalism or the theory stated
> >> within the formalism. HPSG is justified in doing this purely through
> >> the theory, but CCG, TAG, MGs and other communities are equally
> >> justified in doing this (largely, but rarely exclusively) through the
> >> formalism.
> >>
> >> Reason 2
> >> ========
> >>
> >> We don't just want grammatical descriptions, we also want parsing and
> >> learning algorithms with safeties and guaranteed computational bounds.
> >> And you get those safeties and bounds by exploiting computational
> >> limitations of the formalism. If, for instance, your syntactic
> >> derivations have a finite-state backbone, that opens up a whole
> >> toolbox of tricks you can rely on for parsing and learning.
> >>
> >> Now in principle this should work just as well no matter whether we
> >> start with a restricted formalism or an formalism restricted by a
> >> theory, but in practice it simply doesn't. The restrictions of
> >> linguistic theories are not readily exploited in parsing and learning
> >> algorithms, and they are difficult to translate into computational
> >> properties that could be exploited this way.
> >>
> >> In particular, it is not obviously true that we can feasibly
> >> reaxiomatize HPSG into a more restricted formalism once we have
> >> absolute certainty about the restrictions of the theory. The fact that
> >> such a reaxiomatization exists does not mean that we can find it, or
> >> that we can prove that this reaxiomatization is correct and covers all
> >> of HPSG as a theory. To wit, it's unclear whether the TAG
> >> implementation of HPSG in Kasper et al (1995) actually accommodates
> >> "HPSG the theory".
> >>
> >> If you care deeply about these issues, starting with a restricted,
> >> well-understood formalism that gets expanded as needed is the safer
> >> route.
> >>
> >>
> >> A few other points
> >> ==================
> >>
> >> Despite what I just said above, I think the difference between the
> >> HPSG approach and the formal grammar approach isn't actually all that
> >> big.
> >>
> >> The modus operandi of HPSG is to take an unrestricted description
> >> language and then state within that the linguistic restrictions that
> >> we care about. When restrictions turn out to be too strong, we loosen
> >> them, when it turns out there's empirical slack, we tighten them.
> >>
> >> Well, that's exactly what formal grammar does. Our unrestricted
> >> description language is math itself, and with that language we state
> >> the linguistic restrictions we care about --- these restrictions are
> >> what's called a formalism. Just like HPSG theory isn't set in stone,
> >> formalisms aren't set in stone: we can restrict TAG, expand it, design
> >> a completely new formalism, and so on.
> >>
> >> To me, the central advantage of the formal grammar approach is that
> >> you can have two types of restrictions: fine-grained linguistic
> >> restrictions that explain the empirical lay of the land, and the broad
> >> computational restrictions that can be relied on in parsing and
> >> learning. These broad restrictions also provide new hooks for
> >> artificial language learning experiments; they close the gap to
> >> neuroscience where we do not have the means yet to test our
> >> fine-grained linguistic restrictions; they provide the foundation for
> >> comparisons to animal "language", and much more.  Whenever we only
> >> need to be roughly in the right ballpark rather than exactly right,
> >> computational restrictions are an invaluable asset.
> >>
> >> In a nutshell: Computational restrictions don't limit linguistic
> >> research, they foster it.
> >>
> >> Cheers,
> >>    Thomas
> >>
> >>
> >> On 2022/08/05  17:11, Stefan Müller wrote:
> >> >Dear Roussanka,
> >> >
> >> >The general insight from the eightees was that human language is not
> >> >context free. Some people call it mildly context-sensitive. Since GPSG
> >> >was constructed to be context free, people were looking for something
> >> >more powerful and abandoned it in favoure of GPSG. This taken together
> >> >with a more lexical view resulted in HPSG, which has exactly the right
> >> >generative capacity: It has Turing power. =:-)
> >> >
> >> >While CCG and TAG people will tell you that this is way too powerful,
> >> >HPSGians usually have a different view. The formalism should not be
> >> >constraining but the theory formulated within the formalism has to be
> >> >as restrictive as possible. Carl Pollard argued for this in a paper.
> >> >
> >> >The whole discussion and references can be found in the HPSG handbook
> >> >and in my Grammar Theory textbook. I have a (brief) chapter on
> >> >generative power (Chapter 17):
> >> >
> >> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flangsci-press.org%2Fcatalog%2Fbook%2F259&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=63yuMiZeco34ZyhNSiKLlyOxzp9C4K7502dTLo9ax9s%3D&reserved=0
> >> >
> >> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flangsci-press.org%2Fcatalog%2Fbook%2F287&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GjMrix7foSg2k5yhEZCJfiyAH6SXxLAHzXnGoJUddDI%3D&reserved=0
> >> >
> >> >Geoff Pullum has a good paper on the history of the discussion. All
> >> >referenced from with in the book.
> >> >
> >> >Best
> >> >
> >> >    Stefan
> >> >
> >> >
> >> >Am 05.08.22 um 14:24 schrieb Roussanka Loukanova:
> >> >>Dear All,
> >> >>
> >> >>What is the verdict on the relations between Formal Grammar of Human
> >> >>Language and Context-Free Grammar (CFG) of Chomsky hierarchy on formal
> >> >>grammars and languages?
> >> >>
> >> >>I would appreciate very much, opinions, points to research and,
> >> especially,
> >> >>bibliographical references.
> >> >>
> >> >>Best Regards,
> >> >>Roussanka
> >> >>_______________________________________________
> >> >>HPSG-L mailing list
> >> >>HPSG-L at listserv.linguistlist.org
> >> >>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhpsg-l&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E%2F53TKoDnmdH8wWRTBI4hoM0shWoWrAUdyEtHHi%2FtQE%3D&reserved=0
> >> >_______________________________________________
> >> >HPSG-L mailing list
> >> >HPSG-L at listserv.linguistlist.org
> >> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhpsg-l&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E%2F53TKoDnmdH8wWRTBI4hoM0shWoWrAUdyEtHHi%2FtQE%3D&reserved=0
> >>
> --
> Thomas Graf
> Stony Brook University
> Department of Linguistics
> mail at thomasgraf.net
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthomasgraf.net%2F&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dJ78ELY2mmGaOC5clCGKgK7G0TQuLVLmPNGsaGn%2BH%2BE%3D&reserved=0
>
_______________________________________________
HPSG-L mailing list
HPSG-L at listserv.linguistlist.org
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhpsg-l&data=05%7C01%7Cgete2%40universityofcambridgecloud.onmicrosoft.com%7C5d5375f96c0d4ec004ce08da779f17c0%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637953822424427745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E%2F53TKoDnmdH8wWRTBI4hoM0shWoWrAUdyEtHHi%2FtQE%3D&reserved=0


More information about the HPSG-L mailing list