[Corpora-List] canonical order

Geert-Jan M. Kruijff gj at CoLi.Uni-SB.DE
Thu Dec 5 08:45:42 UTC 2002


Jim,

 >         I have the very firm idea that canonical order is that in part
 > because it is the most frequent order in the language.

I think this depends a bit on what you see as the function of word
order in a language that has "free" word order. Many people have
argued that word order helps realizing information structure in such
languages -- Prague School, Vallduvi's information packaging, etc.

THAT variation in word order does indeed indicate different
information structure can be seen from the 'fact' that, even though
different variations might be equally well-formed, they are not
necessarily equally interchangeable in a given context.

If you'd adopt this view on the function of word order, the
"canonical" word order would be the order that realizes an "all-focus"
construction, i.e. one in which no item is indicated as being
dependent on the preceding context ("given"). Needless to say, this
does not need to be the most frequent order.

(NOTE: I am purely concerned here with surface word order, not with
"deep" word order.)

(Shameless plug: See my dissertation for formal models of this view,
based on the Prague school. Dissertation is available from my website.)

 > However, I have
 > done no research on this supposed fact, and cannot think of any
 > offhand.  Does anyone know of any work on the relative frequency of
 > sentences in canonical order and those showing variation in that
 > order?  Of course, this would be especially useful in a 'free word
 > order' language like Spanish, but anything would be welcome.  Likewise,
 > she would be interested in the relative frequency of the different
 > orders of the basic elements, if anyone knows of any work on that (one
 > type of sentence and its variants that she is working with is SUBJECT -
 > VERB - OBJECT - CIRCUMSTANTIAL_COMPLEMENT -- the last is normally a
 > prepositional phrase or adverbial phrase; this would produce in
 > principle 24 different orders in this case, *all* of which are
 > attested and attestable in Spanish, though presumably with rather
 > different relative frequencies of use).

... but do the 24 orders presuppose identical contexts?

 >         She would also like to know who was the first person to coin the
 > term 'canonical order', or to whom it is attributed.  (Or is it just an
 > idea that 'grew'?  This last seems to me to be unlikely, but if anyone
 > has any really old references to the notion, I guess I might have to
 > accept it)

Greenberg, in his "Some universals of grammar with particular
reference to the order of meaningful elements" talks of "basic order",
and refers to work on typology dating back to the nineteenth century
(Footnote 4, page 105). Greenberg himself perceives of the basic order
as the "dominant" order. Essentially, a dominant order is the order
that always occurs as implicatum in universals about word order -- the
preferred order, other things being equal -- i.e. capturing in a
typological fashion the idea of canonical word order. (For a nice
explanation, see Croft's book "Typology & universals", p.53ff.)

When it comes to word order variation, neither Greenberg or Hawkins
(in his book "Word Order Universals") say much. Steele published on
this in the 1970's, proposing to characterize variation on a discrete
scale "rigid", "mixed" and "free". This scale is more fine-grained
than e.g. Skalicka's characterization of variability, based on
morphology. (In my dissertation, I tried to extend Steele's
characterization, and tie it into a characterization of information
structure as typological category.)

I'm not sure whether the above answers your questions completely :-)
What it does point to, though, is that canonicity first of all seems
to depend on what you consider to be the *function* of word
order. Only once you  have fixed THAT, it makes sense to start
collecting frequency data I guess.

Best regards,

Geert-Jan

=============================================================
Dr.ir. Geert-Jan M. Kruijff

Computational Linguistics          Room 3.03, Building 17
University of the Saarland         Phone:  +49.(681).302.4502
Postfach 15 11 50                  Mobile: +49 .179. 479.5892
D-66041 Saarbruecken (Germany)     Fax:    +49.(681).302.4700

gj at coli.uni-sb.de, gj at acm.org      www.coli.uni-sb.de/~gj

"Communications without intelligence is noise;  Intelligence
 without communications is irrelevant."
 -- Alfred. M. Gray



More information about the Corpora mailing list