Arabic-L:LING:Arabic Corpus

Dilworth Parkinson Dilworth_Parkinson at byu.edu
Mon Feb 25 17:42:24 UTC 2002


----------------------------------------------------------------------
Arabic-L: Mon 25 Feb 2002
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message to listserv at byu.edu with first line
reading:
           unsubscribe arabic-l                                      ]

-------------------------Directory-------------------------------------

1) Subject:Arabic Corpus
2) Subject:Arabic Corpus

-------------------------Messages--------------------------------------
1)
Date:  25 Feb 2002
From:Andrew Freeman <andyf at umich.edu>
Subject:Arabic Corpus

Hello Sonia Halimi,
   Watt's concordance tool works better than any other I am aware of for
looking at corpora in Arabic script.  It  supports Arabic, or more
precisely it doesn't do anything to prevent you from taking advatage of
your system's Arabic capabilities if there is Arabic enabled on your
computer, so if you know how how to take advantage of the Arabic system
on your windows machine, you can get it to work for Arabic.  You can
visually examine a big context around words, although the left right
versus front back can be confusing, since left is assumed to "before."
It
will also generate statistics on frequencies and stuff.
   A big problem that it won't solve for you is lemmatization.  ie albyt
and byt get counted as separate items unless you list all items in that
category.  There is probably a fiendishly clever way around that in
Watt's concordancer but I have opted for 'c' code to solve that problem.

  My advice is to avoid Athelstan's MonoConc Pro.

   You can get Rob Watt's concordance at http://www.rjcw.freeserve.co.uk/

   As for corpora, there is all kinds of stuff on the web, including the
Bible, the Quraan and newspapers.

have fun,
cheers,
andy

--------------------------------------------------------------------------
2)
Date:  25 Feb 2002
From: Tim Buckwalter <TimBuckwalter at bainbridge.net>
Subject:Arabic Corpus

Sonia:
You can find a lot of text data here: http://www.muhaddith.org
This website also has software for searching texts but I haven't used it.
For extracting word combination I suggest you learn Perl. I can help you
get
started with some very simple search scripts, if you like.
Regards,
Tim

--------------------------------------------------------------------------
End of Arabic-L:  25 Feb 2002

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2582 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20020225/fa5bddff/attachment.bin>


More information about the Arabic-l mailing list