[Corpora-List] summary: free sentencizers ; test different sentencizers with cgi script

Joerg Schuster js at cis.uni-muenchen.de
Fri Mar 7 15:13:37 UTC 2003


Hello

Recently, I asked for free sentencizers. This is most of the information I got:

+-----------------+------------+--------------------------------------------------------+-------------+
|Name/Nickname    |Author      |Web Site                                                |Comment      |
+-----------------+------------+--------------------------------------------------------+-------------+
|ave              |Ave Wrigley |http://search.cpan.org/author/TGROSE/HTML-Summary-0.017/|perl module  |
|                 |            |                                                        |             |
+-----------------+------------+--------------------------------------------------------+-------------+
|mxterminator     |Adwait      |http://www.cis.upenn.edu/~adwait/statnlp.html           |java,        |
|                 |Ratnaparkhi |                                                        |probabilistic|
+-----------------+------------+--------------------------------------------------------+-------------+
|satz             |David       |http://elib.cs.berkeley.edu/src/satz/                   |written in c,|
|                 |D. Palmer   |                                                        |has to be    |
|                 |            |                                                        |trained      |
+-----------------+------------+--------------------------------------------------------+-------------+
|sentence.cgi     |?           |http://misshoover.si.umich.edu/~zzheng/sentence/        |cgi script   |
+-----------------+------------+--------------------------------------------------------+-------------+
|shlomo           |Shlomo Yona |http://search.cpan.org/author/SHLOMOY/                  |perl module  |
|                 |            |Lingua-EN-Sentence-0.25/lib/Lingua/EN/Sentence.pm       |             |
+-----------------+------------+--------------------------------------------------------+-------------+
|ttt              |?           |http://www.ltg.ed.ac.uk/software/ttt/index.html         |Seems to be  |
|                 |            |                                                        |available    |
|                 |            |                                                        |only for     |
|                 |            |                                                        |SPARC        |
|                 |            |                                                        |machines     |
+-----------------+------------+--------------------------------------------------------+-------------+

You can test the programs ave, mxterminator and shlomo here:

http://www.cis.uni-muenchen.de/~js/sentencize

If you do non-trivial tests, please let me know the results.

I have only performed a *very* simple test with ave, mxterminator and shlomo:
I had them sentencize the 15-sentence test corpus given below. mxterminator and
shlomo each correctly recognized 6 sentences, ave correctly recognized 8.

Jörg Schuster


Here is the 15-sentence test corpus I used:


It is 0.025-in. long. A. lives in the U.S. John
Mackenzie Jr. lives in Dallas, Tex. This is a
fact. At 3.p.m. Continental finalized its
offer. Complaints should be sent to
Dr. White. He stopped at Meadows Dr. White
Falcon was still open. This happened at 3 p.m. Did Conti-
nental finalize its offer?  "There is such a quantity
of unknown and instructive documents" -- H. A.
Taine, August 1875. The cost is $95.40 per average field
trip; John, pay attention! How is infection transmitted?
It is not  transmitted from: giving blood/mosquito
bites/toilet seats/kissing/from normal day-to-day contact.



More information about the Corpora mailing list