[Corpora-List] English tokenizer
Chris Jordan
chris.jordan at acm.org
Thu Aug 16 13:55:47 UTC 2007
I would also suggest OpenNLP. That is the Java package that we use to
parse out sentences.
http://opennlp.sourceforge.net/
I believe it uses some form of maximum entropy approach for doing the
parsing. It has been a while since I have read its accompanying
publications.
--
Chris Jordan
Computer Science PhD Candidate
Dalhousie University
On 16-Aug-07, at 6:46 AM, ben dbabis samira wrote:
> Hi,
> I would be gratefull if you give me references of software (java
> implementation) that can tokenize text into sentences ( based not
> only on punctuation delimiters).
>
> Thanks for help
> Samira BEN DBABIS
> MIRACL Laboratory
> Sfax, TUNISIA
>
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers
> Yahoo! Mail
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070816/6ae9a862/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list