Corpora: minimum size of corpu?

ramesh at ramesh at
Thu Feb 10 01:08:19 UTC 2000

If there are only 10 chapters, 276 verses, of Biblical Aramaic extant,
then that's the biggest corpus of Biblical aramaic the world is ever going
to see.
I don't know how many "words" there are in an average verse, but say there
are 20, you'll have a corpus of c. 55,200 words. You may be able to discover
some interesting features in the word-frequency list, especially by comparing
the list with word frequencies for other small corpora of similar size, and
especially other corpora of similar content, in Aramaic or other languages.
You may also be able to find interesting features in repeated phraseologies,
again more so with contrastive studies.
Forensic linguistics has been looking the problems of using quantitative
methods for short texts (suicide notes, threatening letters, etc) and
small corpora (small sets of witness statements, one of which may be disputed)
etc. Some colleagues at Birmingham may have clearer ideas on this.
But software tools that use statistical methods tend to yield more
reliable results when applied to larger corpora, as far as I understand
the maths involved (which isn't very far!).

ramesh Krishnamurthy
Corpus Research Group
University of Birmingham

More information about the Corpora mailing list