[Corpora-List] summary: free sentencizers ; test differentsentencizers with cgi script

Joerg Schuster js at cis.uni-muenchen.de
Mon Mar 10 09:05:03 UTC 2003


> 1. The test passes the text using GET method and does not "escape" the
> text before sent to the server. This can easily crash your test program.

I will improve this. But it may take some time, because our system
administrators will install a new operating system on our web server
tomorrow. (And I am not sure how and if things will work after that.)

> 3. As I am also on this mailing list, I'd be happy to accept bug-reports and
> feature requests and further develop this software. Hopefully, if there is
> enough interest it will grow to be good enough so everyone can use
> it.

I think one of the disandvantages of your program is that it stores
all data in main memory. You have to say something like

 my $sentences=get_sentences($in);

Though this is very comfortable when dealing with small files, I would
like to rather say something like

while(<>) {
	  print_sentences;
}

Then huge files could easily be sentencized, too.

Jörg



More information about the Corpora mailing list