FW: Corpora: Morphology and Word Length (was: Relatve text length)
Tadeusz Piotrowski
tadpiotr at plusnet.pl
Fri Apr 26 18:43:31 UTC 2002
Is there really any language-independent morphology? I doubt it, and I
recall that even for one language there are conficting views on
morphology, i.e. a word has as many morphemes as the theory allows it.
Regards
Tadeusz Piotrowski
> -----Original Message-----
> From: owner-corpora at lists.uib.no
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Mike Maxwell
> Sent: Friday, April 26, 2002 3:37 PM
> To: corpora at lists.uib.no
> Subject: Corpora: Morphology and Word Length (was: Relatve
> text length)
>
>
>
> Damlon Davison writes:
> >It may be obvious, but agglutinating languages
> >tend to have longer words
>
> --or at least the _average_ length of words in agglutinating
> languages tends to be longer, which presumably is what is
> meant here. Languages like English that have substantial
> derivational morphology can have some long words, but a
> glance at a text in an agglutinating language like Quechua
> will show the difference in average length.
>
> I suspect polysynthetic languages also have long word
> lengths, but whether that's true on the average, or only of
> some words (verbs with incorporated nouns, say), I don't
> know. I've never looked at an extended text in such a
> language. And of course compounding can create long words
> (look at a German text), and perhaps reduplication in
> languages that use whole-word reduplication.
>
> I suspect that another influence on word length is the
> phonology: words with large phoneme inventories tend to have
> shorter words. Does anyone have data on this? E.g.
> languages with large numbers of consonants (the Caucasus
> region?), or languages with lots of tones (some Chinese
> languages--in Romanized scripts, of course!, or Chinantec
> languages (Mexico)), as opposed to languages like Hawai'ian,
> which is notorious for a small phoneme inventory (around 13,
> as I recall) and long words.
>
> Since there are at least two factors related to word length
> (morphology and phonology), and several different factors
> within morphology, I wonder whether anyone has experimented
> with automatic classification of morphological type. We're
> having a workshop at the ACL this summer on morphology
> learning, but it ought to be able to get a rough idea of how
> many affixes there are without learning the "entire"
> morphology. Perhaps just seeing how compressible a text is
> would give you some idea, or turning it into a minimized FSA.
>
> Finally, there is a big caveat: the length of a word depends
> very much on orthographic decisions. Are clitics written
> solid? Compounds?
>
> Written German has long 'words' because the compound nouns
> are written solid. If they were written with a space between
> the nouns, the word length would become a lot shorter--not to
> mention how much easier it would be to read. I guess the
> original observation on this is by Mark Twain :-).
>
> I have even heard of a language where the linguist who
> designed the orthography decided to write a space between
> each morpheme, turning an agglutinating language into an
> isolating language in the orthography! (One wonders how the
> written language will look after a generation or two.)
>
> Mike Maxwell
> Linguistic Data Consortium
> maxwell at ldc.upenn.edu
>
>
More information about the Corpora
mailing list