attractions and distractions
Mike Maxwell
maxwell at ldc.upenn.edu
Sat Jun 26 00:51:56 UTC 2004
As an outsider (I haven't done anything in syntax for fifteen years or
more, which pre-dates HPSG), I have hesitated to join this discussion.
But our van is stuck in a traffic jam, so here goes...
I am told that the "generative revolution" happened not because Chomsky
and others persuaded the then-current generation of linguists, but
because they convinced their students. If one can project from that era
to now, then one might predict that the long term future of HPSG (or its
progeny) will depend on what attracts today's students. That might have
some implications for the original topic of this thread(attracting
conference papers), but there's another question I'd like to ask: if a
student today is interested in language, what sub-field of linguistics
(broadly conceived) will attract him/her?
There are a number of dimensions along which you could "arrange" people
who find language interesting. One such dimension is that of
science/math-oriented vs. literature-oriented. (I was just at the
meeting of the ALLC/ACH, and met people who fall into both ends of this
dimension at the same time; so much for a clear-cut scale...)
I suspect that those interested in linguistic theory tend towards the
science/math end of this spectrum. I've been in computational
linguistics for the last fifteen or twenty years, and I would venture to
guess that that field is going to syphon off an increasing number of
students at that same science/math end of the spectrum, who might
otherwise be drawn to theoretical linguistics. One obvious pull is the
dollars (or just jobs, never mind the pay); another is the intensive use
of computers.
This could be healthy for HPSG (and other implementable theories):
increased interest in computational linguistics could lead people to
test variants of the theory on the computer, do more grammars of more
languages on the computer, thereby pushing the limits of the theory, etc.
My sense of the field is that this is not what is happening. To be
sure, there are good tools like LKB, and there are groups building
detailed grammars. But this is not how most computational linguistics
is done. In fact, virtually all the articles I've seen in the last few
years in _Computational Linguistics_, or papers at conferences, have
been statistically based work, with only a tiny amount of built-in
linguistic knowledge. If you think of CL as engineering, this is
reasonable: parasitic gaps just aren't very common in running text, so
who cares about them? But in the end, it means that non-linguistic
techniques dominate: increasingly, computational linguistics is about
computing, not about linguistics (IMHO).
(Also, to the extent that CL is seen as successful, it may well
influence people in the psychology and cog sci departments away from
theoretical approaches.)
If my suspicion is correct, the """enemy""" (those are scare quotes) of
HPSG is not MP, but computational linguistics. That's where the
students (many of them) will go, and theoretical linguistics--of
whatever stripe--will be increasingly seen as useless. (Of course, it
will be no more useless than before CL became "big", but that's not the
point.)
That said, some of the work in CL relies on bootstrapping from
human-annotated (or checked) data. In syntax, this includes treebanks.
This work probably comes the closest to having a theoretical basis:
issues arise of argument vs. adjunct status, for instance, or the
inter-convertability of different annotation schemes (binary vs. N-ary
branching, phrase structure approaches vs. dependency grammars).
So if I were in a university linguistics department today, I would see
how I could get my students involved in annotation, including
treebanking: it would give them a certain kind of computational
linguistic experience (and job experience), and at the same time it
would get them involved with real problems in real languages.
I would also hazard a guess that there will be an increasing interest in
the near future on data coming from other languages. Some of these
languages are obvious (Arabic and Mandarin have figured prominently in
recent government-sponsored work), others are perhaps not. It would be
one way to pull the field away from the English-centric model that one
poster to this thread mentioned.
Annotation of "exotic" languages might also be a way to attract interest
from the language documentation crowd.
I suppose "annotation" sounds boring, and I'm sure it sometimes is. But
it has the benefit of forcing you to pay close attention to things
that, at a quick glance, might appear simple--but which aren't.
BTW, morphology is one area that seems to have resisted the statistical
approaches. While people like John Goldsmith have built programs that
learn a certain amount of morphology, I doubt that anyone considers it
possible to do serious computational morphology work at this point on
the basis of machine learning. Odd, since I always considered syntax
more complex than morphology. Maybe the standards are higher...
Mike Maxwell
Linguistic Data Consortium
More information about the HPSG-L
mailing list