[Corpora-List] Chomsky and computationnel (sic) linguistics (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:08:48 UTC 2007




---------- Forwarded message ----------
Date: Sat, 14 Jul 2007 20:25:54 -0700 (PDT)
From: HT_LING <ht_ling at sbcglobal.net>
To: corpora at uib.no
Subject: Re: [Corpora-List] Chomsky and computationnel (sic) linguistics

----
For those of you who may be interested, may I draw your attention to the following article that
Chuck Meyer and I wrote that was just recently published? It deals with an extensively studied
phenomenon in formal syntax that is found to be rare in discourse.

Hongyin Tao and Charles F. Meyer, 2006. Gapped Coordinations in English: Form, Usage, and
Implications for Linguistic Theory. Corpus Linguistics and Linguistic Theory. 2.2 (December
2006):129-163.

I attach the conclusion part to this message.

Thank you.

Hongyin Tao
Associate Professor
Depts of Asian Languages and Cultures &
Applied Linguistics and TESL
UCLA

-----------------------

7. Concluding remarks

Our corpus-based investigation has provided empirical evidence that
gapping is a highly marked kind of syntactic phenomenon in English. It
is virtually non-existent in interactive speech and has a very limited presence
in certain types of monologues and written registers. From a syntactic
point of view, gapping favors simple structures and is intimately associated
with low content verbs and low Transitivity clauses. The vast majority
of gapping, indeed, involves some kind of linking element and can
be deemed copula-derived. Its unique syntax, featuring brevity, rhythm,
and parallelism, is both a plus and a minus depending on the communicative
context. In seeking to understand the usage patterns of gapping
in English discourse, we have found that information flow, social interaction,
and stylistic functions are contributing factors to the ways that
gapping structures are constituted and used.

Our study raises a number of issues relevant to both corpus linguistics
and linguistics in general. One is how to interpret low frequency phenomena.
What we find to be interesting here is that even very low frequency
syntactic constructions can have a consistent discourse profile.
By this we mean that while gapping can be deemed “unfit” for many
communicative tasks, especially face-to-face interaction, it nevertheless
has its own utilities in written (especially journalistic and literary) genres
because of its unique and thus highly marked structure. These findings
highlight the importance of examining many different discourse genres
to achieve a proper understanding of syntax and grammar, and, additionally,
raise interesting questions about how large a corpus one needs
to study infrequently occurring linguistic items.

The development of large corpora such as the British National Corpus
(BNC) and the Bank of English Corpus was in part motivated by the
belief that certain linguistic phenomena could only adequately be studied
if large datasets were examined. In the case of lexis, this is certainly the
case, since many vocabulary items will be missed if only shorter corpora
such as Brown are studied. In the case of particular grammatical constructions,
however, our study suggests that information about their
form and function can be gained even if relatively few examples are
found (for similar ideas, see Carter and McCarthy 1995). Of course, we
would be more confident about our conclusions if we had examined, say,
400 instances of gapping rather than 120. Nevertheless, an examination
of additional examples is highly likely to find the repetition of patterns
we have observed in ICE-GB, namely that gapping is confined to writing
rather than speech, and within writing to certain genres, such as press reportage.
But to study gapping, even in a relatively small corpus such as ICEGB,
it was crucial that we worked with a dataset containing extensive
grammatical annotation. The extent to which corpora should be annotated
has inspired considerable debate within corpus linguistics. While the
decision to annotate ICE-GB extensively was based on the premise that
“
 the comparison of corpora containing just raw text cannot go beyond
linguistically rather trivial observations” (Aarts 1992: 181), others
have argued that such annotation pre-biases one’s analysis, and that it
is more desirable that a corpus be created “in raw form and analyse[d]

 fresh each time some analysis is required” (Sinclair 1992: 384). Both
views have merit. How one analyzes ICE-GB and ultimately what one
finds will certainly be determined by the manner in which the corpus has
been tagged and parsed. It is worth noting, however, that had ICE-GB
not been annotated, to study gapping in it, we would have had to have
engaged in the kind of painstaking manual analysis that linguists such
as Otto Jespersen conducted over a century ago.

Finally, although formalists and corpus linguists will probably never
reach a consensus on the value of intuition-based vs. corpus-based analyses,
we view these approaches as complementary, not contradictory. Had
we not had the benefit of the vast number of gapped constructions discussed
in the formalist literature on gapping, we would not have known
what to potentially search for in ICE-GB. But not finding the kinds of
constructions described by formalists led us to consider not just why we
found what we did but why we did not find what we had expected to
find  the kind of analysis that Fillmore (1992) aptly characterizes as
“computer-aided armchair linguistics.”

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list