[Corpora-List] English tokenizer

Chris Jordan chris.jordan at acm.org
Thu Aug 16 13:55:47 UTC 2007


I would also suggest OpenNLP. That is the Java package that we use to  
parse out sentences.
http://opennlp.sourceforge.net/

I believe it uses some form of maximum entropy approach for doing the  
parsing. It has been a while since I have read its accompanying  
publications.
-- 
Chris Jordan
Computer Science PhD Candidate
Dalhousie University


On 16-Aug-07, at 6:46 AM, ben dbabis samira wrote:

> Hi,
> I would be gratefull if you give me references of software (java  
> implementation) that can tokenize text into sentences ( based not  
> only on punctuation delimiters).
>
> Thanks for help
> Samira BEN DBABIS
> MIRACL Laboratory
> Sfax, TUNISIA
>
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers  
> Yahoo! Mail
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070816/6ae9a862/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list