Arabic-L:LING:Word Form List from arabiCorpus
Dilworth Parkinson
dil at BYU.EDU
Wed Feb 16 19:57:48 UTC 2011
------------------------------------------------------------------------
Arabic-L: Wed 16 Feb 2011
Moderator: Dilworth Parkinson <dil at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject: Word Form List from arabiCorpus
-------------------------Messages-----------------------------------
1)
Date: 16 Feb 2011
From: Dil Parkinson <dil at byu.edu>
Subject: word form list from arabiCorpus
A couple of people asked me about a word frequency list from arabiCorpus. Of course, arabiCorpus is an unlemmatized corpus, so it is impossible to create a word frequency list for it. However, it is possible to create a 'word form' list, meaning that every distinct graphemic word form is counted separately. This means that يكتب is counted separately not only from تكتب, but also from ويكتب، يكتبه، , فيكتب, ليكتب, etc. Anyway, I have produced such a list, and have made it available for download at the following url:
arabiCorpus.byu.edu/wordFormListSource.html
Once you get there, click on the folder, click on the file you want to download, and choose 'more' from the sub-menu, which lets you choose 'download'. There is an info file which explains what the different files are.
dil
--------------------------------------------------------------------------
End of Arabic-L: 16 Feb 2011
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20110216/d38885ca/attachment.htm>
More information about the Arabic-l
mailing list