[Corpora-List] EMILLE Tools Release
Mcenery, Tony
eiaamme at exchange.lancs.ac.uk
Fri Feb 20 11:58:56 UTC 2004
Dear All,
Apologies if you receive multiple copies of this message, especially if
you have no interest whatsoever in its contents.
Following a number of requests, I have decided to mount the EMILLE
character encoding conversion software (unicodify) on the EMILLE
download site (http://www.ling.lancs.ac.uk/corplang/emille/default.htm).
The conversion software was developed at Lancaster University, and
allows users to convert 30 (or so) different 8 bit encodings of South
Asian scripts commonly found in both publishing and on the web into 16
bit little-endian Unicode format. The software is very useful indeed if
you plan to collect South Asian corpus data from the web. As with the
EMILLE corpus, the software may be used freely for non-commercial
research.
Also, an Urdu POS tagger is now mounted on the EMILLE download site.
Again, it is free for use in non-commercial research.
Both downloads include documentation etc.
Enjoy!
Tony McEnery,
Professor of English Language and Linguistics,
Dept. Linguistics and Modern English Language,
Lancaster University,
Bailrigg,
Lancaster,
LA1 4YT.
More information about the Corpora
mailing list