[Corpora-List] spanish tokenizer

Maria Esteva mesteva at mail.utexas.edu
Mon Oct 16 13:31:10 UTC 2006


Dear all,

I am a PhD student in the School of Information, University of Texas 
at Austin. For my dissertation, I will text mine a large set of 
corporate electronic records in Spanish. For this, I need to find an 
open source spanish tokenizer, if possible in C++ although other 
languages would be fine as well. I am familiar with the Lucene tool 
set so if you know about another source where I can find this tool I 
will appreciate your help.

Thanks in advance,

Maria Esteva



More information about the Corpora mailing list