<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="font-family: arial; font-size: 13px; "><pre id="nonprop"><p>------------------------------------------------------------------------
Arabic-L: Wed 16 Feb 2011
Moderator: Dilworth Parkinson <<a href="mailto:dil@byu.edu">dil@byu.edu</a>>
[To post messages to the list, send them to arabic-l@byu.edu]
[To unsubscribe, send message from same address you subscribed from to
<a href="mailto:listserv@byu.edu">listserv@byu.edu</a> with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject: Word Form List from arabiCorpus
-------------------------Messages-----------------------------------
1)
Date: 16 Feb 2011
From: Dil Parkinson <<a href="mailto:dil@byu.edu">dil@byu.edu</a>>
Subject: word form list from arabiCorpus
A couple of people asked me about a word frequency list from arabiCorpus. Of course, arabiCorpus is an unlemmatized corpus, so it is impossible to create a word frequency list for it. However, it is possible to create a 'word form' list, meaning that every distinct graphemic word form is counted separately. This means that يكتب is counted separately not only from تكتب, but also from ويكتب، يكتبه، , فيكتب, ليكتب, etc. Anyway, I have produced such a list, and have made it available for download at the following url:
<a href="http://arabiCorpus.byu.edu/wordFormListSource.html">arabiCorpus.byu.edu/wordFormListSource.html</a>
Once you get there, click on the folder, click on the file you want to download, and choose 'more' from the sub-menu, which lets you choose 'download'. There is an info file which explains what the different files are.
dil
--------------------------------------------------------------------------
End of Arabic-L: 16 Feb 2011
</p><div><br></div></pre></span></body></html>