[Corpora-List] most frequent 50 words in Egyptian newspapers

Emad Mohamed emohamed at umail.iu.edu
Tue Aug 9 14:40:40 UTC 2011


In a ddition to what Aziz mentioned,  you will need a word
tokenizer/segmenter in order to handle the morphological richness of Arabic.
Google Arabic tokenization for this.
On Tue, Aug 9, 2011 at 4:21 PM, True Friend <true.friend2004 at gmail.com>wrote:

> Dear Montasser
> You'll have to select a few newspapers of Egyptian English and then
> download the news items after 25 January. This can be done by a website
> downloader, or by an HTML Crawler, or you can write your own script (if you
> know how to write one in Python, Pearl etc).
> Well it would be simple enough to get a word list.
> Sorry I couldn't provide any technical and specific solution. :-)
> Regards
> --
> *Muhammad Shakir Aziz* *محمد شاکر عزیز*
> *Masters in Applied Linguistics
> Translator, Course Developer, Linguist for Urdu, Punjabi and English*
> Urdu:- http://awaz-e-dost.blogspot.com/
> English:- http://linguisticslearner.blogspot.com/
> Facebook:- http://www.facebook.com/truefriend2004
> Skype:- true_friend2004
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
Emad Soliman Ali Mohamed
aka Emad Nawfal (*عماد نوفل*)
PhD in Linguistics, Computational Linguistics Track,
Department of Linguistics,
Indiana University, Bloomington
http://jones.ling.indiana.edu/~emadnawfal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110809/544da088/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list