9.1476, Sum: Addendum to GoldVarb Summary

Thu Oct 22 14:20:43 UTC 1998

LINGUIST List:  Vol-9-1476. Thu Oct 22 1998. ISSN: 1068-4875.

Subject: 9.1476, Sum: Addendum to GoldVarb Summary

Moderators: Anthony Rodrigues Aristar: Wayne State U.<aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Associate Editors:  Martin Jacobsen <marty at linguistlist.org>
                    Brett Churchill <brett at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>

Assistant Editors:  Scott Fults <scott at linguistlist.org>
		    Jody Huellmantel <jody at linguistlist.org>
		    Karen Milligan <karen at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Chris Brown <chris at linguistlist.org>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Scott Fults <scott at linguistlist.org>

=================================Directory=================================

1)
Date:  Thu, 22 Oct 1998 15:39:17 +0300
From:  sigley at ic.daito.ac.jp (Robert Sigley)
Subject:  GoldVarb (addendum)

-------------------------------- Message 1 -------------------------------

Date:  Thu, 22 Oct 1998 15:39:17 +0300
From:  sigley at ic.daito.ac.jp (Robert Sigley)
Subject:  GoldVarb (addendum)

In writing to Mario, I referred to (and included a copy of) Ch.7 of my
PhD thesis [Sigley, R. 1997. Choosing Your Relatives: Relative Clauses
in New Zealand English. PhD thesis, Victoria University of Wellington,
New Zealand.]  This chapter compares logistic/Varbrul analysis with
more ordinary chi-squared tests on crosstabulated data; it's intended
as a practical guide to interpreting the GoldVarb output.

My email to Marco was a summary of that material, with additional
speculations, one of which was certainly wrong as stated (see below).
I write now so that anyone wishing to discuss details with me can do so
directly (email: Sigley at ic.daito.ac.jp).

(i) The number of degrees of freedom in a logistic or loglinear model =
(the number of independently estimated parameters - the number of fixed
parameters).

Question: Is this equal to (number of factors) - (number of factor groups),
as Avila states, or to (number of factors + 1) - (number of factor groups)?
In other words, does the 'input weight' (which is also iteratively
estimated) count?

(ii) The comment I made in parentheses below is inaccurate.

>It is possible to use this method to incorporate several interaction
>effects into the model -- but it quickly becomes rather cumbersome, as you
>will often have to collapse distinctions in order to include the
>crossproduct factor group, and things get really messy when you need to
>consider several interactions involving the same factor group. (I think the
>best way to treat these is stepwise: if the most significant interaction is
>between groups 1 and 2, and you suspect there's also an interaction between
>groups 1 and 3, you can only approach it indirectly by comparing models
>containing 1*2, 3, 4,...n and 1*2*3, 4,...n. By contrast, if you try
>constructing a model containing 1*2, 1*3, 4,...n then you've effectively
>encoded the distinctions from group 1 twice, which means your model has
>redundant parameters and could produce unreliable results.)

Here I was trying to reconcile differences between what I know in theory
and what seems to work in practice, and managed a rather garbled account; a
fuller explanation follows.

Suppose we're comparing the models:

(a) 1*2, 3, 4, ... , n  (a model containing the interaction effect between
groups 1 and 2, but treating every other factor group as independent)

(b) 1*2, 1*3, 4, ... , n ( a model containing independent interactions
between groups 1 and 2, and groups 1 and 3)

(c) 1*2, 1*3, 2*3, 4, ... , n (containing independent 2-way interactions
for groups 1 and 2, 1 and 3, 2 and 3)

(d) 1*2*3, 4, ... , n (containing the 3-way interaction for groups 1, 2 and 3)

In theory:

To test the significance of adding the 1*3 interaction to a model
containing the 1*2 interaction, you should compare models (a) and (b).

To test the significance of further adding the 2*3 interaction, you should
compare models (b) and (c).

To test the significance of the 3-way 1*2*3 interaction, you should compare
models (c) and (d).

These models show increasing complexity, and an increasing number of
independently-estimated parameters, from (a) < (b) < (c) < (d).

In practice: this doesn't always work, for several reasons.

* Crossproducts often contain many apparently categorical environments
  ('knockouts') -- mostly because of low cell occupancy, but also because
  of systematic gaps -- which must be excluded or collapsed for analysis.
  Performing these simplifications sometimes produces nonsensical results.
  I've often found that a model containing a 3-way interaction contains
  *fewer* independently-estimated parameters than the supposedly
  'simpler' model containing the 3 2-way interactions -- once
  knockouts are excluded. Thus *in some cases* you won't be able to use
  the recommended model test, and some more indirect approach will be
  necessary.

* Crossproducts often contain a large number of factors. This may mean that
  the overall model has a higher number of parameters than is justified by
  the number of tokens in the dataset. Thus, accidental redundancy (where
  several combinations of factors describe the same set of tokens) may
  result. This is particularly likely when you include two factor groups
  based partly on the same distinctions (eg the 1*2, 1*3 crossproducts,
  which will both partition the dataset along the divisions from the
  original group 1). I must emphasise that including such crossproducts of
  shared factor groups does not necessarily result in redundancy (in contrast
  to what my original statement implied) -- but it does make it more likely.

Cheers,
   Robert Sigley.
+-----------------------------------------------+
| Robert Sigley, Foreign Languages Dept         |
| (English Division), Daito Bunka University,   |
| 1-9-1 Takashimadaira, Itabashi-ku, Tokyo 175  |
+-----------------------------------------------+

---------------------------------------------------------------------------
LINGUIST List: Vol-9-1476