Replicability

Wed Aug 25 16:03:02 UTC 2004

Dear lingtypers,

I agree with Martin that replication in typology/universals would mean
selecting a new sample of languages and testing the hypothesis on that
sample. I also agree that there is not much glory in doing that if
nothing new is learned. However, a research group could certainly make
replication of a study a part of a larger study that also tests other
hypotheses.

Martin also says:
But reproducibility would show that typology really is a science. Maybe
one reason people do not do it is that they are not so convinced that is
not art after all. Especially the application of definitions to
different languages is not something that can be done totally
mechanically, and it is hard to get rid of the element of subjectivity.
For instance, Bybee, Perkins & Pagliuca 1994 ("The evolution of
grammar") note on p. xvii that they had to work very hard to ensure that
the coding by the three authors yielded similar results. They were happy
when they had reached ninety percent inter-coder reliability.

Sciences that use quantitative data are used to probabilistic results.
90% replicability is quite high for a social science. One benefit of
having lots of data (the GRAMCATS project coded 2187 grams) and using
statistical methods is that some noise in the data can be tolerated.
What would make typology a science is not 100% reliability in coding,
but rather the procedure of fomulating hypotheses and testing them on
appropriate samples of languages. While I see a lot of
hypothesis-formulation there is much less hypothesis-testing. Of course,
the reason for that is that it is very difficult and time-consuming. But
that is really no excuse!

Larry says:

I find interesting that you (Martin) view replicability as doing a study
over
using a completely different sample. Is this the normal meaning? I
would be more concerned to know if another set of researchers would
have reached the same conclusions with the SAME sample. You mention
Bybee et al striving for 90% coding agreement, but they were already
working closely in the same lab and had discussed issues and would
naturally have reached some common ground before taking on the task
of coding. Imagine another lab making very different decisions (and
gettings its 90% coding agreement). How much would this affect the
results?

To replicate a study, a new set of researchers would have to use the
same set of criteria. The criteria used in the Bybee et al. GRAMCATS
project can be found in Chapter 2 of Bybee et al. 1994. A more detailed
version was written up in a Coding Manual. It is quite easy to follow in
most cases and someone new to the project could easily achieve 90
reliability. For instance, Martin Haspelmath came into the project at
midstream and coded at least two languages with little difficulty (as I
remember it). The difficulties only arise because some linguistic
categories are not discrete and sometimes the information was
incomplete. As I said above, however, given the quantity of data the
noise introduced by these factors was insignificant.

So my opinion is that replication should take place on a different
sample of languages, using the same criteria.

Joan Bybee