[Corpora-List] summary: free sentencizers ; test different sentencizers with cgi script
Joerg Schuster
js at cis.uni-muenchen.de
Fri Mar 7 15:13:37 UTC 2003
Hello
Recently, I asked for free sentencizers. This is most of the information I got:
+-----------------+------------+--------------------------------------------------------+-------------+
|Name/Nickname |Author |Web Site |Comment |
+-----------------+------------+--------------------------------------------------------+-------------+
|ave |Ave Wrigley |http://search.cpan.org/author/TGROSE/HTML-Summary-0.017/|perl module |
| | | | |
+-----------------+------------+--------------------------------------------------------+-------------+
|mxterminator |Adwait |http://www.cis.upenn.edu/~adwait/statnlp.html |java, |
| |Ratnaparkhi | |probabilistic|
+-----------------+------------+--------------------------------------------------------+-------------+
|satz |David |http://elib.cs.berkeley.edu/src/satz/ |written in c,|
| |D. Palmer | |has to be |
| | | |trained |
+-----------------+------------+--------------------------------------------------------+-------------+
|sentence.cgi |? |http://misshoover.si.umich.edu/~zzheng/sentence/ |cgi script |
+-----------------+------------+--------------------------------------------------------+-------------+
|shlomo |Shlomo Yona |http://search.cpan.org/author/SHLOMOY/ |perl module |
| | |Lingua-EN-Sentence-0.25/lib/Lingua/EN/Sentence.pm | |
+-----------------+------------+--------------------------------------------------------+-------------+
|ttt |? |http://www.ltg.ed.ac.uk/software/ttt/index.html |Seems to be |
| | | |available |
| | | |only for |
| | | |SPARC |
| | | |machines |
+-----------------+------------+--------------------------------------------------------+-------------+
You can test the programs ave, mxterminator and shlomo here:
http://www.cis.uni-muenchen.de/~js/sentencize
If you do non-trivial tests, please let me know the results.
I have only performed a *very* simple test with ave, mxterminator and shlomo:
I had them sentencize the 15-sentence test corpus given below. mxterminator and
shlomo each correctly recognized 6 sentences, ave correctly recognized 8.
Jörg Schuster
Here is the 15-sentence test corpus I used:
It is 0.025-in. long. A. lives in the U.S. John
Mackenzie Jr. lives in Dallas, Tex. This is a
fact. At 3.p.m. Continental finalized its
offer. Complaints should be sent to
Dr. White. He stopped at Meadows Dr. White
Falcon was still open. This happened at 3 p.m. Did Conti-
nental finalize its offer? "There is such a quantity
of unknown and instructive documents" -- H. A.
Taine, August 1875. The cost is $95.40 per average field
trip; John, pay attention! How is infection transmitted?
It is not transmitted from: giving blood/mosquito
bites/toilet seats/kissing/from normal day-to-day contact.
More information about the Corpora
mailing list