[Corpora-List] spanish tokenizer
Maria Esteva
mesteva at mail.utexas.edu
Mon Oct 16 13:31:10 UTC 2006
Dear all,
I am a PhD student in the School of Information, University of Texas
at Austin. For my dissertation, I will text mine a large set of
corporate electronic records in Spanish. For this, I need to find an
open source spanish tokenizer, if possible in C++ although other
languages would be fine as well. I am familiar with the Lucene tool
set so if you know about another source where I can find this tool I
will appreciate your help.
Thanks in advance,
Maria Esteva
More information about the Corpora
mailing list