[Corpora-List] spanish tokenizer

Marco Baroni baroni at sslmit.unibo.it
Mon Oct 16 14:59:20 UTC 2006


The freeling suite includes an open source Spanish tokenizer implemented in 
C++:

http://garraf.epsevg.upc.es/freeling/index.php

Regards,

Marco


Maria Esteva wrote:
> Dear all,
> 
> I am a PhD student in the School of Information, University of Texas at 
> Austin. For my dissertation, I will text mine a large set of corporate 
> electronic records in Spanish. For this, I need to find an open source 
> spanish tokenizer, if possible in C++ although other languages would be 
> fine as well. I am familiar with the Lucene tool set so if you know 
> about another source where I can find this tool I will appreciate your 
> help.
> 
> Thanks in advance,
> 
> Maria Esteva
> 



More information about the Corpora mailing list