Corpora: Corpora in Uzbek - Summary

Michael Audenaert aggiemedic01 at yahoo.com
Thu Jul 5 13:52:57 UTC 2001


All,
Here is a summary of responses that I recieved from my
post regarding corpora in Uzbek.  Many thanks for
everyone's help in this and my apologies for being so
late in getting this out.

Kevin McTait said:
try the ECI (European Corpus Initiative) CD ROM. On it
there is an English-Uzbek corpus in  the form of a
novel (cannot remember which one). THe Uzbek is
transliterated into the Latin script tho.

Ramesh recommended the TELRI TRACTOR archive:
http://www.tractor.de or http://www.telri.de

Trond Trosterud recommended the U. of Helsinki:
http://www.ling.helsinki.fi/uhlcs/

A little bit of poking around in directions suggested
by Tomaz Erjavec turned up the following sites:

Two links the University of Leiden that can be found
here:
http://iias.leidenuniv.nl/kreeft/IIASNONLINE/Newsletters/Newsletter10/Regional/Contents.html#AnchorCA

The Central Asian Languages Corpora
"The Uzbek corpus was completed in 1996. It contains
1,100,000 tokens approximately in 23 corpus texts from
388 different modern published sources." (from the
site)
http://www.let.uu.nl/oosters/CALC1.html

The LDC may have some data, though I didn't find it in
my very quick search and didn't try hard after that.
http://www.ldc.upenn.edu/


Neal Audenaert
neal_audenaert at acm.org

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/



More information about the Corpora mailing list