[Corpora-List] Which Statistical Test is Suitable

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Sun Jul 17 14:02:25 UTC 2011


Now that the focus of the discussion has moved away from statistics to linguistics, and people
have started to question the hypothesis on which the original question was based, I feel more able to contribute... :)

I agree with comments that mention diachronic aspects (vintage, language change), social aspects (variety, subgroup identity),
and researcher's perspective/goal...

Yorick:  the different frequencies of ceiling(/cieling) and piece(/peice) raises some interesting questions.
You did not reveal the Google figures you obtained, so I did a quick check:

"ceiling" About 223,000,000 results (0.20 seconds) = c. 99.7% [COCA: 11846; BYU-BNC: 2208]
"cieling" About 738,000 results (0.13 seconds) = c. 0.03% [COCA: 4; BYU-BNC: 0]

"piece" About 1,290,000,000 results (0.11 seconds) = c. 99.2% [COCA: 45038; BYU-BNC: 9027]
"peice" About 10,800,000 results (0.13 seconds) = c. 0.08% [COCA: 3; BYU-BNC: 1]

a) to what extent does the higher frequency of piece : ceiling = c. 6:1 in Google (slightly less in COCA/BYU-BNC)
mean there are more opportunities for 'error' (of whatever type) in the spelling of 'piece'?

b) the extremely low frequencies of cieling/peice  in COCA/BYU-BNC may represent cleanup/normalisation, or text-type restriction?

c) to what extent do people take more or less care over the spelling of a word the rarer it is (eg ceiling);
are more educated/specialist writers involved in its use, as opposed to the total speech community?

d) the 'correct' spelling is the reverse in the 2 examples you chose; do the absolute/relative positions of
the characters on QWERTY keyboards, or mobile phone SMS keys, affect the occurrences?

e) which occurrences are 'typos', which are 'errors', and which are deliberate choices (eg 'mentions' rather than 'uses'
in linguistic contexts like this posting or in classroom spelling correction exercises; or in humorous writing, etc)?

best
ramesh


Ramesh Krishnamurthy
Visiting Academic Fellow, School of Languages and Social Sciences, Aston University, Birmingham B4 7ET
Room: NX01. Tel: 0121-204-3812.
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/
Project Investigator, GeWiss (Volkswagen Foundation) project: http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academic-discourse/
------


Date: Sat, 16 Jul 2011 17:06:30 +0100

From: Yorick Wilks <Y.Wilks at dcs.shef.ac.uk>

Subject: Re: [Corpora-List] Which Statistical Test is Suitable

To: Angus Grieve-Smith <grvsmth at panix.com>

Cc: corpora at uib.no



Im not sure if this contributes anything to the discussion or not, but there is clearly an obvious distinction between

,on the one hand, acceptable variants in spelling, where the balance shifts over time, e.g. judgement/judgment (especially as that one is not a US/UK distinction, a quite separate issue), and, on the other, misspellings among which there is huge variation, even for seemingly similar "errors".

For example if you compare on Google ceiling/cieling with piece/peice (where, importantly, the wrong versions are non-words in both cases) and where both seem to be "e/i reversal" you find that one error is a hundred times commoner than the other. Im not sure what that tells us about anything---such as mastery of explicit rules and their exceptions or not, as the case may be.

Yorick Wilks



On 16 Jul 2011, at 15:51, Angus Grieve-Smith wrote:



> On 7/14/2011 3:36 AM, True Friend wrote:

>> As you can see the frequencies are closely related, my aim was to summarize the group behaviour. The point here is to show the general public's usage, that despite of rules available, people are confused in spelling of these words.

>

>    It's rarely just a case of "people are confused."  Croft (2000) talks about cases where a single community uses two different variants.  He is referring primarily to morphological or syntactic variation, but I think this also applies to spelling variation.  There are three possible outcomes:

>

> 1. The alternative forms are reassigned to different functions so that they are no longer in competition.

> 2. The variation is reinterpreted as corresponding to a division of the community.

> 3. The community gradually shifts towards the use of one variant or the other.

>

>    It's still not clear to me what statement you're trying to make, who you're trying to convince, and what your ultimate political goal is.  Whose usage do they really care about?

>

> Croft, William. 2000. Explaining Language Change: An Evolutionary Approach. London: Longman.

>

> --

>                       -Angus B. Grieve-Smith

>                       grvsmth at panix.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110717/c9bca312/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list