[Corpora-List] Extracting text from Wikipedia articles

Sat Aug 28 00:58:34 UTC 2010

Hi Irina, and all

besides regarding singular, possibly costly tools, you should consider
more strongly enduring, free open source means:

R is a very high script programming language, apt for text manipulation,
and processing, mathematical, and statistical analysis, rich graphical
output, controllable by several graphical user interfaces.

Meanwhile R is a lingua franca, available for almost all computer systems
at                               http://cran.at.r-project.org/
It has multi-language documentation, a journal, mailing-lists, user
conferences
for the worldwide experts, and users.

For your purpose within the ~2500 packages for application, there is
            http://cran.at.r-project.org/web/packages/tm/vignettes/tm.pdf
giving the  entrance for text mining, and corpus analysis.

After installing R, and 'tm', it will give you a basis for your
scientific development(s).
For me, it is an amazing enlightening experience since 1996/7 for
developing,
and work.
                        best regards - Hartmut Oldenbürger, Göttingen
University, Germany

---
http://www.wipaed.wiso.uni-goettingen.de/~holdenb1

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora