[Corpora-List] Which Statistical Test is Suitable

chris brew cbrew at acm.org
Wed Jul 13 15:17:03 UTC 2011


I partially agree with Geoffrey Sampson's points. It is certainly true that
a table of numbers, in isolation, tells you nothing about the question you
are asking, for the reasons that Professor Sampson gives. And statistical
tests will not change this situation. To make progress, you need to be
precise about what you intend to count as a  "spelling error". You could for
example reframe the problem by as "how likely is it that the numbers that we
observe are due to random mistakes in typing?", then proceed to make a
mathematical model of typing errors. Or you could contrast the typing error
hypothesis with an alternative hypothesis and frame the question as "Are the
numbers that we observe more likely to be the result of typing errors or
more likely to be due to the existence in the writing population of two
groups of people, one of which always tries to spell the word one way, and
one of which tries to spell it the other way". It will take some clear
thinking to get this comparison right, because you have to make a precise
quantitative judgement on things like the prior probability of finding
groups that spell differently in the way we hypothesize. From experience of
US/UK spelling differences, I believe that it would be a tricky and subtle
matter to come up with suitably precise and useful hypotheses. No surprise
there, as linguists we are used to working with challenging and complex
data.

But, if you do manage to set up sufficiently precise hypotheses, and
associate numbers with the hypotheses, statistical reasoning definitely can
help. That's what it is for. This kind of thinking is the basis for all
statistical tests that I am aware of. What you are never going to find is a
statistical test that frees you from the necessity of making (or finding in
the work of other scholars)  a precise and careful analysis of the problem
you are trying to solve.

Chris

On Wed, Jul 13, 2011 at 10:34 AM, Geoffrey Sampson <grs2 at sussex.ac.uk>wrote:

> Dear Muhammad Shakir Aziz,
>
> I don't see that anyone else has responded to your query, so let me do so,
> rather late.  I would say that no kind of statistical test could possibly
> indicate whether variant spellings were errors, or allowable alternatives;
> because this question is not to do with numbers.  It is a question about
> where authority over the norms of the language you are concerned with is
> felt to lie, and what that authority says about orthography.  Some
> languages, at some periods, tolerate a wide variety of alternative
> spellings for given words, while other languages (or the same languages at
> other periods) may have extremely tightly-defined norms and strong social
> sanctions against violating them.  Carrying out statistical calculations on
> tables of the incidence of alternatives would not tell you anything about
> this, I believe.
>
> Geoffrey Sampson
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Chris Brew, Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110713/4e9bf753/attachment-0001.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list