[Corpora-List] What is best for text processing Perl of Python?

Trevor Jenkins trevor.jenkins at suneidesis.com
Tue Mar 4 18:02:27 UTC 2008


On Tue, 4 Mar 2008, Emiliano Guevara <emiliano.guevara at unibo.it> wrote:

> And most programmers are so fond of their favorite programming
> language that they could explode in anger if someone says another
> language is better...

As a computing scientist I agree with you; there is only one programming
language and that is Algol 68. ;-)

> In any case, all programming languages can deal with text, with pros
> and cons, but all of them do it.

Oh so true. I once wrote an SGML parser in VAX BASIC. Followed a few weeks
later by an object-oriented program in ... again VAX BASIC. Consoled
myself with the knowledge that Xerox first version of SmallTalk was
written in BASIC. ;-)

But some languages are better suit to processing text than others. Fortran
(of any version) is pretty dificult. C and/or C++ make it somewhat easier
by having useful run-time library functions that can be called. Perl is
perhaps the best know language with string manipulation features.
Personally I prefer Snobol4 and ML/I for text manipulation. The Python
NLTK (Natural Language Toolkit) should probably be in all our corpus study
toolboxes. Similar things for ruby too.

> Just learn whatever you can:
> If you have a colleague who can program Perl, follow that.
> If you're into web design and know a little a PHP, go on with that.
> If you like Python and you think it's cool (most do), learn Python.

Probably.

> But don't forget the basic Unix tools that can save you lots of time
> and programming: tr, diff, sed/awk, uniq, sort, etc, etc.

Definitely! One can do so much with those UNIX standard filters and a
modicum of shell scripting.

And don't forget R (from the R Project at http://www.r-project.org/ )
there are several text books devoted to using R in corpus studies.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list