Corpora: Historical background of Corpus Linguistics

John McKenny jmck at dgest.estv.ipv.pt
Fri Apr 19 12:27:39 UTC 2002


Dear Charo
My money's on Jonathan Swift as a clear precursor of corpus methodology. In
Gulliver's Travels (1726) Part III, Ch 5 he describes a machine which
generates books of 'philosophy, poetry, politics, law, mathematics and
theology without the least assistance from genius or study'. The professor
of the Academy of Lagado who invented this engine (worked by 40 pupils who
cranked handles and transcribed the output) told Gulliver that he had
"emptied the whole vocabulary into the frame and made the strictest
computation of the general proportion there is in books between the number
of particles, nouns, and verbs, and other parts of speech". It is well
known that Swift is a/the master of satire and that he was having a go  at
the Royal Society in this passage but he shows that he had thought through
a lot of what would later become AI. He also draws attention, through the
professor of Lagado  Academy, to the importance of prefabs in building or
reconstituting text.
In his introduction to "A complete collection of genteel and ingenious
conversation" (1738) Swift takes up prefabs again and describes how he
built up a collection of fashionable sayings over 12 years field work: "I
determined to spend 5 mornings, to dine 4 times, pass three afternoons, and
six evenings every week in the houses of the most polite families...I
always kept a large table-book in my pocket; and as soon as I left the
company I immediately entered the choicest expressions. He then spent a
further 16 years "digesting it into a method"..Finally he sat on his work
for a further six or seven years. observing: " I have not been able to add
above nine valuable sentences to enrich my collection; from whence I
conclude that what remains will amount only to a trifle".
Nowadays Swift's collection of smart chat might contain in the blurb that
it was based on the author's own corpus which was more than 30 years in the
making.
A passage in Section 1, Introduction of   an earlier work (1704) Tale of a
Tub provides further justification for making Swift the patron saint of
Corpus Linguistics if such an honour were not anathema (wrong word!) to the
Dean's stern Protestant anti-popery. Or,  at any rate, a prime precursor.
"I am informed our ...rivals... challenge us to a comparison of books, both
as to weight and number...we are ready to accept the challenge..." Although
he's always playful and elusive I think he shows a genuine fascination with
the quantificational, physical side of language. Have I been taken in by
one of the greatest hoaxers of all time?
Mucha suerte
John


John McKenny
Departamento de Gestão
Escola Superior de Tecnologia de Viseu
Campus Politécnico
3500 Viseu
Portugal
jmck at dgest.estv.ipv.pt



More information about the Corpora mailing list