Corpora: A multilingual-supportive program

Songlin Piao s.piao at dcs.shef.ac.uk
Mon Jun 4 22:40:07 UTC 2001


Hi, 

A Java multilingual-supportative program with a graphical interface for searching regular expression, text encoding conversion and sentence/paragraph/title delimitation. is downloadable from my websie: http://www.dcs.shef.ac.uk/~piao/Research/DownLoad/download.htm

In addition to Unicode, it can read text written in numerous encodings. With unicode font, it can display many languages.

The sentence/paragraph spliting function works quite well on irregular text formats except missing of punctuation marks. It has been tested only on English/Chinese/Korean texts, but it should work on other languages using same punctuation marks as English.

For details, please have a look at the webpage.

Scott PIao
------------------------------------
Dept. of Computer Science
University of Sheffield
Email: s.piao at dcs.shef.ac.uk





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20010604/05f7a09e/attachment.htm>


More information about the Corpora mailing list