[Corpora-List] Are frequency lists of the most languages equivalent?

maxwell maxwell at umiacs.umd.edu
Mon Oct 10 12:50:48 UTC 2011


On Mon, 10 Oct 2011 13:23:10 +0200, Alexander Osherenko <osherenko at gmx.de>
wrote:
> I am wondering if frequency lists of the most languages can be
considered
> as equivalent. For instance, consider an English frequency list such as 
> the BNC frequency list... and a German frequency list
> The English frequency list starts with the definite article "the". The
> German one - with the definite article "der". Hence, the literal
> translation of the word "the" in German will result the word "der".
> 
> Of course, it is not always enough to translate directly. However, I
> wouldn't wonder if say 70%-80% of the most frequent words in the most
> languages can be considered as equal. 

My off-the-cuff impression is that there is a large amount of variability
in function words across languages, more so than among non-lexical words,
and particularly so when you get away from IndoEuropean languages.  For
example, many languages have no equivalent of definite or indefinite
articles--either none at all, or the distinction is made in the morphology.
Similarly with auxiliary verbs.

The English preposition "of", which came up in later discussion in this
thread, is another example.  It has essentially no meaning--it's the
preposition you use when you need a preposition to relate two nouns, and
there isn't any more specific preposition that works.  I would doubt that
all languages have such a preposition (or adposition).  My favorite
language, Tzeltal, does--but it only has one other preposition, with the
approximate meaning of "with."  All other functions that we use
prepositions for are done with the generic preposition + a noun, e.g. 
   ta y-ut        s-na' 
   at POSS-inside his-house
   "in (the) house"
(Ut is a noun.)  So if you look for one-word equivalents of most English
prepositions, you won't find them.  There are also no possessive pronouns
in this language--possessives are prefixes, as in the above ex.  (There is
also a periphrastic possessive, a syntactic construction.)

Pronouns are also variable.  English over-uses them, because every finite
clause has to have a subject.  But most languages don't require an overt
subject--Spanish doesn't even have a real equivalent to English "it."

All that said, it would certainly be interesting to compare such frequency
lists across languages, particularly non-IndoEuropean languages.  My
feelings won't be hurt if I turn out to be wrong.

   Mike Maxwell

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list