Corpora: A multilingual-supportive program
Songlin Piao
s.piao at dcs.shef.ac.uk
Mon Jun 4 22:40:07 UTC 2001
Hi,
A Java multilingual-supportative program with a graphical interface for searching regular expression, text encoding conversion and sentence/paragraph/title delimitation. is downloadable from my websie: http://www.dcs.shef.ac.uk/~piao/Research/DownLoad/download.htm
In addition to Unicode, it can read text written in numerous encodings. With unicode font, it can display many languages.
The sentence/paragraph spliting function works quite well on irregular text formats except missing of punctuation marks. It has been tested only on English/Chinese/Korean texts, but it should work on other languages using same punctuation marks as English.
For details, please have a look at the webpage.
Scott PIao
------------------------------------
Dept. of Computer Science
University of Sheffield
Email: s.piao at dcs.shef.ac.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20010604/05f7a09e/attachment.htm>
More information about the Corpora
mailing list