FW: Corpora: Morphology and Word Length (was: Relatve text length)

Tadeusz Piotrowski tadpiotr at plusnet.pl
Fri Apr 26 18:43:31 UTC 2002


Is there really any language-independent morphology? I doubt it, and I
recall that even for one language there are conficting views on
morphology, i.e. a word has as many morphemes as the theory allows it.
Regards
Tadeusz Piotrowski

> -----Original Message-----
> From: owner-corpora at lists.uib.no
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Mike Maxwell
> Sent: Friday, April 26, 2002 3:37 PM
> To: corpora at lists.uib.no
> Subject: Corpora: Morphology and Word Length (was: Relatve
> text length)
>
>
>
> Damlon Davison writes:
> >It may be obvious, but agglutinating languages
> >tend to have longer words
>
> --or at least the _average_ length of words in agglutinating
> languages tends to be longer, which presumably is what is
> meant here.  Languages like English that have substantial
> derivational morphology can have some long words, but a
> glance at a text in an agglutinating language like Quechua
> will show the difference in average length.
>
> I suspect polysynthetic languages also have long word
> lengths, but whether that's true on the average, or only of
> some words (verbs with incorporated nouns, say), I don't
> know.  I've never looked at an extended text in such a
> language.  And of course compounding can create long words
> (look at a German text), and perhaps reduplication in
> languages that use whole-word reduplication.
>
> I suspect that another influence on word length is the
> phonology: words with large phoneme inventories tend to have
> shorter words.  Does anyone have data on this?  E.g.
> languages with large numbers of consonants (the Caucasus
> region?), or languages with lots of tones (some Chinese
> languages--in Romanized scripts, of course!, or Chinantec
> languages (Mexico)), as opposed to languages like Hawai'ian,
> which is notorious for a small phoneme inventory (around 13,
> as I recall) and long words.
>
> Since there are at least two factors related to word length
> (morphology and phonology), and several different factors
> within morphology, I wonder whether anyone has experimented
> with automatic classification of morphological type.  We're
> having a workshop at the ACL this summer on morphology
> learning, but it ought to be able to get a rough idea of how
> many affixes there are without learning the "entire"
> morphology.  Perhaps just seeing how compressible a text is
> would give you some idea, or turning it into a minimized FSA.
>
> Finally, there is a big caveat: the length of a word depends
> very much on orthographic decisions.  Are clitics written
> solid?  Compounds?
>
> Written German has long 'words' because the compound nouns
> are written solid.  If they were written with a space between
> the nouns, the word length would become a lot shorter--not to
> mention how much easier it would be to read.  I guess the
> original observation on this is by Mark Twain :-).
>
> I have even heard of a language where the linguist who
> designed the orthography decided to write a space between
> each morpheme, turning an agglutinating language into an
> isolating language in the orthography!  (One wonders how the
> written language will look after a generation or two.)
>
>      Mike Maxwell
>      Linguistic Data Consortium
>      maxwell at ldc.upenn.edu
>
>



More information about the Corpora mailing list