[Corpora-List] (no subject)
maxwell
maxwell at umiacs.umd.edu
Wed Oct 19 19:26:23 UTC 2011
On Wed, 19 Oct 2011 17:56:27 +0100 (BST), Abu Fahad
<salehosaimi at yahoo.com>
wrote:
> Are you aware of any frequency list of Arabic?
Let me up the generality a bit, since "frequency lists of language X"
seems to be a common request.
I would be surprised if there weren't frequency lists for many languages.
Obviously there are questions (stemmed or not? what does it mean to have a
"balanced" corpus from which to derive such lists?), but such lists
probably have at least some utility. Is there a place with links to
frequency lists of multiple languages?
There are lists of "correctly" spelled words for some languages, which are
sometimes grouped into top 10k words, top 20k etc. I suppose one could
derive a very coarse-grained ranking from such lists. Obviously these
would be inflected words, not stemmed words.
I looked at the ACL wiki
(http://aclweb.org/aclwiki/index.php?title=Main_Page), but nothing jumped
out at me. So I'll prime the pump with a few links to such lists I did
find:
http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
http://invokeit.wordpress.com/frequency-word-lists/
(there's a link to a link to an Arabic list here)
http://borel.slu.edu/crubadan/index.html
("...send me an email if you're interested in
a particular language and there's plenty of
data I am free to share (frequency lists...")
Mike Maxwell
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list