Arabic-L:LING:Word Form List from arabiCorpus

Dilworth Parkinson dil at BYU.EDU
Wed Feb 16 19:57:48 UTC 2011


------------------------------------------------------------------------
Arabic-L: Wed 16 Feb 2011
Moderator: Dilworth Parkinson <dil at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject: Word Form List from arabiCorpus

-------------------------Messages-----------------------------------
1)
Date: 16 Feb 2011
From: Dil Parkinson <dil at byu.edu>
Subject: word form list from arabiCorpus

A couple of people asked me about a word frequency list from arabiCorpus.  Of course, arabiCorpus is an unlemmatized corpus, so it is impossible to create a word frequency list for it.  However, it is possible to create a 'word form' list, meaning that every distinct graphemic word form is counted separately.  This means that يكتب is counted separately not only from تكتب, but also from ويكتب، يكتبه، , فيكتب, ليكتب, etc.  Anyway, I have produced such a list, and have made it available for download at the following url: 

arabiCorpus.byu.edu/wordFormListSource.html

Once you get there, click on the folder, click on the file you want to download, and choose 'more' from the sub-menu, which lets you choose 'download'.  There is an info file which explains what the different files are.

dil

--------------------------------------------------------------------------
End of Arabic-L: 16 Feb 2011


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20110216/d38885ca/attachment.htm>


More information about the Arabic-l mailing list