Corpora: Historical background of Corpus Linguistics

Ute Römer ute.roemer at uni-koeln.de
Thu Apr 18 13:13:11 UTC 2002


Dear Eric and others,

some early (or earlier) pre-electronic corpus-based studies that come to my mind
are E.L. Thorndike's 1921 work on relative frequencies of words (if I remeber
correctly he used a corpus of more than 4 Mio. words!): Teacher's Wordbook. New
York: Columbia Teachers College, and (much earlier) A. Cruden's 1796 (!) Complete
Concordance to the Old and New Testaments. Worth mentioning also is the (later)
work by Michael West: 1953, A General Service List of English Words. London:
Longman. And then of course Otto Jespersen who also used corpus data to compile his
grammar from 1909-1949.

I can't help with the 'where-to-get-Markov's-paper-problem' but I can recommend the
site www.abebooks.com where I got my own Zipf and my own Firths!

Hope this helps!

Best wishes.... Ute


Eric Atwell schrieb:

> Ramesh said:
> > ... perhaps *the* earliest publication of linguistic research using an
> > electronic corpus was: ...
>
> ...but don't forget even earlier Corpus Linguistics research done
> without computers.  For example modern Language Engineering researchers
> extract Zipf distributions and Markov models from corpora; this was
> done earlier "by hand" :
>
> Zipf, George Kingsley (1936) "The psycho-biology of language : an
> introduction to dynamic philology" London : G. Routledge & sons
>
> Markov, A.A. (1913) "Essai d'une recherche statistique sur le texte du
> roman 'Eugene Onegin' illustrant la liaison des epreuve en chain"
> Izvestia Imperatorskoi Akademii Nauk (Bulletin de l'Academie Imperiale
> des Sciences de St-Petersbourg) 7:153-162.
>
> Does anyone have an earlier citation???
>
> Eric Atwell
>
> PS Leeds library has Zipf book but I dont actually have a copy of Markov paper,
> I copied the citation from Jurafsky&Martin(2000) "Speech and Language
> Processing" Prentice Hall - can someone let me have a copy please PLEASE?
>
> --
> Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
> School of Computing, University of Leeds, LEEDS LS2 9JT
> TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
> WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk



More information about the Corpora mailing list