[Corpora-List] spanish tokenizer
Marco Baroni
baroni at sslmit.unibo.it
Mon Oct 16 14:59:20 UTC 2006
The freeling suite includes an open source Spanish tokenizer implemented in
C++:
http://garraf.epsevg.upc.es/freeling/index.php
Regards,
Marco
Maria Esteva wrote:
> Dear all,
>
> I am a PhD student in the School of Information, University of Texas at
> Austin. For my dissertation, I will text mine a large set of corporate
> electronic records in Spanish. For this, I need to find an open source
> spanish tokenizer, if possible in C++ although other languages would be
> fine as well. I am familiar with the Lucene tool set so if you know
> about another source where I can find this tool I will appreciate your
> help.
>
> Thanks in advance,
>
> Maria Esteva
>
More information about the Corpora
mailing list