[Corpora-List] freely available tagger & segmenter for Portuguese, Spanish, and/or French
Raphael Mudge
raffi at automattic.com
Tue Dec 22 16:57:29 UTC 2009
Hi Dan,
Java includes sentence segmentation technology that was contributed by
IBM. It's the java.text.BreakIterator class (intuitively named). I
conducted a quick survey of available text segmentation tools for Java
recently:
http://blog.afterthedeadline.com/2009/11/17/sentence-segmentation-survey-for-java/
Of course a more thorough one was posted to this list awhile back:
http://mailman.uib.no/public/corpora/2007-October/005429.html
As for POS taggers--I believe LanguageTool ( http://
www.languagetool.org ) [LGPL - commercial use OK] has a POS tagger for
Spanish and French.
Best of luck.
-- Raphael
Raphael Mudge
Code Wranger, Automattic
http://www.afterthedeadline.com
On Dec 22, 2009, at 10:17 AM, Daniel Da Silva De Paiva wrote:
> Hi there,
>
> I'd like to know if someone could point me in the right direction(s)
> here.
>
> I need to get taggers and sentence segmenters for the following
> languages
> (Portuguese, Spanish, and/or French) or gold standard tagged corpora
> (for the same languages) that would allow me to train them.
>
> The main point is that the resources should be freely available for
> commercial usage, but I'd be interested to know those that are freely
> available for academic usage as well.
>
> Thanks a lot,
>
> :Dan Paiva
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list