[Corpora-List] summary: free sentencizers ; test differentsentencizers with cgi script
Joerg Schuster
js at cis.uni-muenchen.de
Mon Mar 10 09:05:03 UTC 2003
> 1. The test passes the text using GET method and does not "escape" the
> text before sent to the server. This can easily crash your test program.
I will improve this. But it may take some time, because our system
administrators will install a new operating system on our web server
tomorrow. (And I am not sure how and if things will work after that.)
> 3. As I am also on this mailing list, I'd be happy to accept bug-reports and
> feature requests and further develop this software. Hopefully, if there is
> enough interest it will grow to be good enough so everyone can use
> it.
I think one of the disandvantages of your program is that it stores
all data in main memory. You have to say something like
my $sentences=get_sentences($in);
Though this is very comfortable when dealing with small files, I would
like to rather say something like
while(<>) {
print_sentences;
}
Then huge files could easily be sentencized, too.
Jörg
More information about the Corpora
mailing list