[Corpora-List] Re: Common connectors

FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE jfidel at siu.buap.mx
Mon Apr 25 01:02:54 UTC 2005


Wallace Chen wrote:

<<I am currently doing a research on Chinese connectors, which have around
270 types and broadly include conjunctions and sentence adverbs. These are
derived from a five-million-word corpus of contemporary Chinese. My question
is how to determine which ones are "common"? Are there statistical criteria
(e.g. cut-off point) to determine "common connectors" from such a list?>>

Xiao, Zhonghua (also known as Richard) wrote back:

<<I think there is no established statistical norm for what should be
considered as "common". Maybe we can take account of the two factors
underlying Mike Scott's idea of "key keyword": frequency and dispersion. If
an item is frequent and it also occurs in a large number of genres and/or
texts in your corpus, it can be considered as "common". The cut-off points
for frequency and coverage, of course, depend upon how many connectors you
want to include in your study.>>

Yuanyong Wang <wyy at cse.unsw.EDU.AU> also wrote back:

<<... I think it depends on what do you mean when refering to "common",
there there are different sets of common words for different domains. If the
connectors play some role similar to that of functional words then I suggest
they are all common(irrespective to domains). Regardless, 270 words
extracted from a five-million word corpus don't seem to be a very big
set....>>

While these comments are well-taken, that's not the whole story, depending
on what the researcher's interests are.  As long ago as 1975 (_Chicago
linguistic society_), I showed that, in English, in at least some cases,
what counts as 'common' (I think I used the term 'familiar') depends on the
phonological structure of the word, as far as vowel reduction is concerned.
Thus, while frequency is indeed important and even crucial for many things
in language, other factors may impinge on its effects.  In English, for
example, some quite rare words (eg, 'berserk') act phonologically like
common words, because of their semantic/phonological saliency (You might
even want to say outrageousness).  In the same sense, it *might* be the case
(this is just a wild guess) that, for example, two-syllable connectors, or
especially ones ending in a consonant cluster, say, could act differently
from one-syllable ones, independently of their frequency.  (Of course,
unless Chinese has by now totally lost its monosyllabic character, this
hypothetical example would not be valid for Chinese, but some other
morphophonological characteristic might influence things.)

Jim

James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla     MÉXICO



More information about the Corpora mailing list