[Corpora-List] Are frequency lists of the most languages equivalent?

Alexander Osherenko osherenko at gmx.de
Mon Oct 10 11:23:10 UTC 2011


Hi all,

I am wondering if frequency lists of the most languages can be considered as
equivalent. For instance, consider an English frequency list such as the BNC
frequency list (http://www.kilgarriff.co.uk/bnc-readme.html<http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww%2Ekilgarriff%2Eco%2Euk%2Fbnc-readme%2Ehtml&urlhash=KPiq&_t=tracking_anet>)
and a German frequency list
(http://german.about.com/library/blwfreq01.htm<http://www.linkedin.com/redirect?url=http%3A%2F%2Fgerman%2Eabout%2Ecom%2Flibrary%2Fblwfreq01%2Ehtm&urlhash=99CW&_t=tracking_anet>).
The English frequency list starts with the definite article "the". The
German one - with the definite article "der". Hence, the literal translation
of the word "the" in German will result the word "der".

Of course, it is not always enough to translate directly. However, I
wouldn't wonder if say 70%-80% of the most frequent words in the most
languages can be considered as equal. Notice I don't say the words should be
also ordered in the same manner. For example, word "of" always comes before
the word "appear". Nevertheless, I anticipate that words "of" and "appear"
are present in the most frequent words of the most languages in every
possible order even if particular language uses the word "appear" more often
than the word "of".


Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111010/a9f874a1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list