Corpora: corpora: Statistical test procedures in quantitative stylistic analysis

Wed Apr 11 09:04:21 UTC 2001

Dear list-members,

I am working on a doctoral thesis in German Medieval Studies
and I am turning to you with a question concerning statistical test
procedures in computer-aided quantitative stylistic analysis.

The aim of my study is to develop a body of programs designed
to examine Medieval German epics or passages of them with
respect to statistical differences. In the second part of my paper
I want to demonstrate some applications of these programs. The
overall aim – as in most projects in the area of literary criticism
using quantitative stylistic analysis – is to find statistical evidence
in addition to the arguments of scholarly criticism.

The programs cover a multitude of distinguishing features: simple
quantitative data such as length of words or verses, frequencies
of vowels and consonants, some stylistic devices which can be
easily captured, function words, words and combinations of
words which are particularly frequent, as well as some
syntactical and metrical parameters.

I hope that my programs will contribute arguments for the
following questions:
-   In general: Are there significant differences between the
    texts examined?
-   Are there variations within the work of one author with
    respect to his/her style, e.g. if there is a literary model
    that the author draws on for parts of his/her text?
-   Can texts or passages of a text of one author be
    assigned to the same or different periods of his/her
    literary production?
-   Can texts the authorship of which is uncertain be
    assigned to one or several authors?
For an investigation of the last two questions, several texts will
certainly have to be examined for comparison.

The programs are intended to be designed not for my use only. I
intend to give them a structure and documentation which makes
it possible for any medievalist to apply them even if he or she has
no knowledge of programming languages. The user shall be able
to segment a given text, to adapt the lists of function words and
to determine the scope of the intended analysis.

My question concerns the statistical test procedure which is used
to determine if the differences found between two texts or
samples which were compared are statistically significant or not.

Up to now I have been using the Wilcoxon-White-Test (also
called Man-Whitney-Test) as a test of statistical significance.
For this purpose, the program segments the texts to be
examined into paragraphs which are each 100 verses long. For
each paragraph, the frequency of the respective stylistic feature
is recorded so that the text segments can be put in an order
according to the frequency of the respective stylistic feature.

I chose this test since Adam Kilgarriff (among others)
recommended it. ("Which words are particularly characteristic of
a text? A survey of statistical approaches",
http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/publications.html
#1996). I preferred the Wilcoxon-White-Test over the Log-
Likelihood-Test, which is also recommended there, because I
expect medium to high frequencies for the stylistic features I
want to examine in the rather long texts or text passages (at least
1000 verses).

I have now been made a little unsure by the essay by David I.
Holmes’. In view of the many studies based on multivariate
methods in the last few years, Holmes states:
„Principal Component Analysis is a standard technique in
multivariate statistical data analysis. [...] The trend towards
usage of multivariate statistical methods is now so established in
stylometry that it is unusual to find papers which do not use
them.” (The Evolution of Stylometry in Humanities Scholarship,
LLC 13, 1998, S. 113f.)

I have now become unsure about the question how efficient the
Wilcoxon-White-Test is, respectively if ‘unusual’ here is to say
‘wrong’ or ‘anachronistic’. I should be extremely grateful for any
ideas or suggestions on this topic.

On the one hand I want to apply an adequate test procedure, on
the other hand I cannot claim to fully understand PCA. PCA
would furthermore clash with my intention to make the programs
accessible to a mulititude of Medievalist colleagues, because for
all I can see, some knowledge about statistics is required not
only for the implementation of the test procedure but also for the
evaluation. It seems to me that the Wilcoxon-White-Test is
considerably easier to handle, requiring only the judgement if
two texts differ with respect to a certain feature significantly, that
is at a probability of more than 95%, or not significantly.

I would be grateful for any comments.

Friedrich Michael Dimpel

Friedrich Michael Dimpel M.A.
Institut für Germanistik
Bismarckstr. 1, 91054 Erlangen
Tel./Fax: 09131-85 22186 (10-12 Uhr)
fhdimpel at phil.uni-erlangen.de