Arabic-L:LING:Arabic Corpus (plug for Perl)

Dilworth Parkinson Dilworth_Parkinson at byu.edu
Tue Feb 26 17:02:34 UTC 2002


Arabic-L: Tue26 Feb 2002
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message to listserv at byu.edu with first line
reading:
           unsubscribe arabic-l                                      ]

-------------------------Directory-------------------------------------

1) Subject:Arabic Corpus (plug for Perl)

-------------------------Messages--------------------------------------
1)
Date:  26 Feb 2002
From:Dan Parvaz <dparvaz at unm.edu>
Subject:Arabic Corpus (plug for Perl)

For extracting word combination I suggest you learn Perl.

I agree with Tim. Concordancing programs (including MonoConc, with all
due respect to the good doctor Barlow), like any software, can only do
what the developers predicted you might want to do. Stray from script,
and you're left with an inadequate solution. The answer: write your own
script.

I'll be teaching a "computing for linguists" class this in the Fall
which will include a substantial Perl component, and I haven't found a
good linguist-oriented text (although Tibor Kiss told me he had one in
the works, perhaps to be published by CSLI). While the bioinformatics
Perl texts have a lot of string-manipulation stuff, they're too
specialized. You might want to look at Cross's _Data Munging with Perl_,
which has some very useful chapters, including material you might care
about for yanking web pages and stripping off HTML/XML tagging.

Cheers,

Dan.

--------------------------------------------------------------------------
End of Arabic-L:  26 Feb 2002
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1666 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20020226/374b335a/attachment-0001.bin>


More information about the Arabic-l mailing list