[Corpora-List] freely available tagger & segmenter for Portuguese, Spanish, and/or French

Raphael Mudge raffi at automattic.com
Tue Dec 22 16:57:29 UTC 2009


Hi Dan,
Java includes sentence segmentation technology that was contributed by  
IBM. It's the java.text.BreakIterator class (intuitively named). I  
conducted a quick survey of available text segmentation tools for Java  
recently:

http://blog.afterthedeadline.com/2009/11/17/sentence-segmentation-survey-for-java/

Of course a more thorough one was posted to this list awhile back:

http://mailman.uib.no/public/corpora/2007-October/005429.html

As for POS taggers--I believe LanguageTool ( http:// 
www.languagetool.org ) [LGPL - commercial use OK] has a POS tagger for  
Spanish and French.

Best of luck.

-- Raphael

Raphael Mudge
Code Wranger, Automattic
http://www.afterthedeadline.com

On Dec 22, 2009, at 10:17 AM, Daniel Da Silva De Paiva wrote:

> Hi there,
>
> I'd like to know if someone could point me in the right direction(s)  
> here.
>
> I need to get taggers and sentence segmenters for the following  
> languages
> (Portuguese, Spanish, and/or French) or gold standard tagged corpora  
> (for the same languages) that would allow me to train them.
>
> The main point is that the resources should be freely available for
> commercial usage, but I'd be interested to know those that are freely
> available for academic usage as well.
>
> Thanks a lot,
>
>     :Dan Paiva
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list