[Corpora-List] Extracting text from Wikipedia articles
Hartmut Oldenbürger
jcleese at liteline.de
Sat Aug 28 00:58:34 UTC 2010
Hi Irina, and all
besides regarding singular, possibly costly tools, you should consider
more strongly enduring, free open source means:
R is a very high script programming language, apt for text manipulation,
and processing, mathematical, and statistical analysis, rich graphical
output, controllable by several graphical user interfaces.
Meanwhile R is a lingua franca, available for almost all computer systems
at http://cran.at.r-project.org/
It has multi-language documentation, a journal, mailing-lists, user
conferences
for the worldwide experts, and users.
For your purpose within the ~2500 packages for application, there is
http://cran.at.r-project.org/web/packages/tm/vignettes/tm.pdf
giving the entrance for text mining, and corpus analysis.
After installing R, and 'tm', it will give you a basis for your
scientific development(s).
For me, it is an amazing enlightening experience since 1996/7 for
developing,
and work.
best regards - Hartmut Oldenbürger, Göttingen
University, Germany
---
http://www.wipaed.wiso.uni-goettingen.de/~holdenb1
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list