[Corpora-List] spanish tokenizer

Jorge Civera Saiz jorcisai at iti.upv.es
Mon Oct 16 14:07:35 UTC 2006


Hi Maria,

Take a look at Freeling:

FreeLing 1.5 An Open Source Suite of Language Analyzers

Here you can find information about FreeLing, an open source language analysis
tool suite, released under the GNU Lesser General Public License (LGPL) of the
Free Software Foundation.

These tools have been developed at TALP Research Center, in Universitat
Politècnica de Catalunya. Spanish and Catalan morphological dictionaries and
grammars were initially developed by Centre de Llenguatge i Computació, in
Universitat de Barcelona, and since then improved and extended to other
languages thanks to many contributions. 

www: http://garraf.epsevg.upc.es/freeling/

Best regards,

Jorge


Mensaje citado por Maria Esteva <mesteva at mail.utexas.edu>:

> Dear all,
> 
> I am a PhD student in the School of Information, University of Texas
> at Austin. For my dissertation, I will text mine a large set of
> corporate electronic records in Spanish. For this, I need to find an
> open source spanish tokenizer, if possible in C++ although other
> languages would be fine as well. I am familiar with the Lucene tool
> set so if you know about another source where I can find this tool I
> will appreciate your help.
> 
> Thanks in advance,
> 
> Maria Esteva
> 
> 

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/



More information about the Corpora mailing list