statistical jargon

Victor Steinbok aardvark66 at GMAIL.COM
Sat Jun 2 00:05:10 UTC 2012


It is fairly rare that we can definitively attribute specific coinages
to individuals. This, however, is not the case in statistical jargon.
Many modern terms used in statistics can be traced to the individuals
who coined them. However, the OED seems to miss of the jargon (perhaps
there is an Oxford Information Sciences Dictionary where they all might
otherwise go). Not only that, even when an opportunity presents itself,
OED misses it.

Bit, n.4
> Etymology: Abbrev. of /binary digit/.
> A unit of information derived from a choice between two equally
> probable alternatives or ‘events’; such a unit stored electronically
> in a computer.
> 1948 C. E. Shannon in /Bell Syst. Techn. Jrnl./ July 380 The choice of
> a logarithmic base corresponds to the choice of a unit for measuring
> information. If the base 2 is used the resulting units may be called
> binary digits, or more briefly /bits/, a word suggested by J. W. Tukey.

I included the first quotation for obvious reasons. It describes the
coinage of the term. It also specifically mentions the person who came
up with it. This is not at all surprising. Tukey was at Princeton
(undergrad) with Richard Feynman and the two of them have been notorious
for their coinages in their respective fields. Feynman's coinages are a
matter for a detailed study, but Tukey is a known inventor of
terminology for virtually all modes of graphical representation we use
today, including, in particular, box-and-whiskers plots and
stem-and-leaf plots. In fact, he invented both the representation and
the term for it. Although the older generation may be scratching their
heads at these, both of these terms and several other Tukey coinages are
now taught in middle and secondary school in the US as a matter of
routine. As such, they warrant inclusion, especially as the terminology
may not be familiar to those who have not gone through such education
recently or in the US. Tukey gets 11 mentions in the OED, But only 6
refer to the right Tukey (John, the statistician). Two of them are
covered by the above quote. Another also identifies a coinage.

Nyquist
> 4. b. Nyquist frequency n. (also Nyquist limit) the maximum frequency
> of signal or of data distribution that can be uniquely recovered in a
> sampled channel without risk of aliasing, equal to half the sampling
> frequency.
> 1963 /Philos. Trans./ (Royal Soc.) A. *255* 512 Any oscillations whose
> frequency exceeds the Nyquist frequency (one-half the sample
> frequency) will be indistinguishable from some lower frequency. Tukey
> calls this effect ‘aliasing’.

Aliasing shows up in one more quotation but has no definition.
Antialiasing (or anti-aliasing) shows up in four more quotations, but
also without a definition. A few years ago, this would have been
acceptable, but not today. Note that the term is now used well outside
its originally coined meaning.

Another coinage seems to be mishandled completely.

Spectral, adj.
> spectral analysis n. Chemical analysis of substances by means of their
> spectra; analysis of light or another oscillating system into a spectrum.
> 1862 /Amer. Jrnl. Sci./ *84* 404 There are few branches of science
> which promise more magnificent results than the spectral analysis.
> 1888 /London, Edinb. & Dublin Philos. Mag./ 5th Ser. *25* 343
> (/heading/) Mathematical spectral analysis of magnesium and carbon.
> 1930 /Proc. IRE/ *18* 1199 Expression (9) lends itself to spectral
> analysis into its component frequencies by the following process.
> 1978 /Nature/ 16 Mar. 232/2 As a further step, we carried out a
> spectral analysis according to the techniques of Blackman and Tukey on
> the time series for each of our latitude bands.

Note that the last quote mentions Tukey. In fact, it does not refer to
the same kind of spectral analysis that is identified in the definition.
In this case, it is a purely statistical technique (hence Tukey's name).
Check Wiki disambiguation of "spectral analysis" for details.

Another may or may not be a coinage.

Momentless
> 3.b. St/atistics/. Of a probability distribution: having moments
> (moment n. 8d) equal to zero.
> 1946 /Ann. Math. Statistics/ *17* 381 Condition (A) is satisfied for
> certain statistics even if their distribution functions are as
> momentless as the startling distributions constructed by Brown and Tukey.

Yet another includes a quote from Tukey and another that clearly
identifies him as the coiner.

Jackknife, n.
> 5. /Statistics/. A versatile method of reducing the bias of estimates
> and assessing their variability, using subsets of the available data
> which are often obtained by deleting a single value from the complete
> set.
> 1964 /Ann. Math. Statistics/ *35* 1594 Turkey adopted the name
> ‘jackknife’ [in 1958] for this procedure, since a boy scout's
> jackknife is symbolic of a rough-and-ready instrument capable of being
> utilized in all contingencies and emergencies.
> 1968 Mosteller & Tukey in Lindzey & Aronson /Handbk. Social Psychol./
> (ed. 2) II. x. 134 The mean of results based on several subsamples is
> likely to be more biased than is a single result based on all the
> data, at least to the extent that the individual samples are small. A
> method with wide application, intended to ameliorate these problems,
> is the jackknife.

The final one lists Tukey's paper as the earliest (so far) source, but
it's not clear if he had coined it. In fact, in this case, the term has
nothing to do with statistics.

Ditto
> dittoed adj. reproduced by a Ditto machine.
> 1955 /Biometrics/ *11* 42 Tukey, J. W., ‘The Problem of Multiple
> Comparisons’, unpublished dittoed notes, Princeton University, 396
> pp., 1953.

Perhaps its too much to expect from a dictionary, but, if the coiner is
known, I'd like to see it in the article.

VS-)

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list