Corpora: Relatve text length

Damon Davison allolex at SDF.LONESTAR.ORG
Thu Apr 25 20:12:36 UTC 2002


Linguists interested in comparative word length are most likely
interested in *written* language.  In fact, most corpus research is
based on writing, since at our current state of technology, doing corpus
research on speech is difficult.  In one example of its usefulness,
comparing word lengths across languages can provide a quick means of
error-checking for machine translation output.  It is also possible, to
a certain extent, to characterize languages typologically by word
length.  It may be obvious, but agglutinating languages tend to have
longer words.

I don't think it would be wrong to say that linguistics is the study of
language as a system.  (Human beings seem to systematize things quite
naturally.)  Written language also belongs to the system of language.
In fact, writing systems may even tell you more about the language than
speech analysis, since written language often contains historical data
that contributes to our understanding of current language use.


Damon Davison

--
--
Damon Allen Davison
http://allolex.lonestar.org
allolex at sdf.lonestar.org



More information about the Corpora mailing list